Nuclear Magnetic Resonance and Dynamic

2 downloads 0 Views 12MB Size Report
statistical analyses were done with the program JMP IN 5.1 (SAS Institute Inc., Cary, NC). Relaxation measurements were done at two fields to permit finer ...
Nuclear Magnetic Resonance and Dynamic Characterization of the Intrinsically Disordered HIV-1 Tat Protein

BY Shaheen Shojania

A Thesis Submitted to the Faculty of Graduate Studies in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Department of Chemistry University of Manitoba © July 30, 2007

Abstract The HIV-1 transactivator of transcription (Tat) is a protein essential for both viral gene expression and virus replication. Tat is an RNA-binding protein that, in cooperation with host cell factors cyclin T1 and cyclin-dependent kinase 9, regulates transcription elongation. Tat also interacts with numerous other intracellular and extracellular proteins, and is implicated in a number of pathogenic processes. The Tat protein is encoded by two exons and is 101 residues in length. The first exon encodes a 72-residue molecule that activates transcription with the same proficiency as the full-length protein. The physicochemical properties of Tat make it a particularly challenging target for structural studies: Tat contains seven cysteine residues, six of which are essential for transactivation, and is highly susceptible to oxidative cross-linking and aggregation. In addition, a basic segment (residues 48-57) gives the protein a high net positive charge of +12 at pH 7, endowing it with a high affinity for anionic polymers and surfaces. In order to study the structure of Tat, both alone and in complex with partner molecules, we have developed a system for the bacterial expression and purification of polyhistidine-tagged and isotopically enriched (in

15

N and

15

N/13 C) recombinant HIV-1 Tat1−72 (BH10 isolate) that yields large amounts

of protein. These preparations have facilitated the assignment of 95% of the non-proline backbone resonances using heteronuclear 3-dimensional nuclear magnetic resonance (NMR) spectroscopy. Analysis by mass spectrometry and NMR demonstrate that the cysteine-rich Tat protein is unambiguously reduced and monomeric in aqueous solution at pH 4. NMR chemical shifts and coupling constants suggest that it exists in a disordered conformation. Line broadening and multiple peaks in the cysteine-rich and core regions suggest that transient folding occurs in two of the five sequence domains. NMR 15 N-relaxation parameters were measured and analysed by spectral density and model-free approaches both confirming

the lack of structure throughout the length of the molecule.

The absence of a fixed

conformation and the observation of fast dynamics are consistent with the ability of the Tat protein to interact with a wide variety of proteins and nucleic acid lending further support to the concept that Tat exists as an intrinsically disordered protein.

ii

For Pamela. There are no words to describe my sense of gratitude and love for her friendship, love and support.

iii

Science is a wonderful thing if one does not have to earn one’s living at it. Albert Einstein

iv

Contents

List of Figures

ix

List of Tables

xiv

Copyrighted Material

xvi

Acknowledgments

xviii

Abbreviations

xxi

1 Introduction 1.1

1

Intrinsically Disordered Proteins . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1.1

The Origin of the Structure-Function Paradigm . . . . . . . . . . . .

2

1.1.2

Discrepancies in the Structure-Function Paradigm . . . . . . . . . . .

4

1.1.3

Discovery of Intrinsic Disorder . . . . . . . . . . . . . . . . . . . . . .

5

1.2

Classifications of Disorder and the Protein Trinity . . . . . . . . . . . . . . .

6

1.3

Intrinsic Disorder and Function . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.3.1

Protein-Chameleon . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

16

1.4

Disorder Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

1.5

Detecting and Characterizing Disorder . . . . . . . . . . . . . . . . . . . . .

19

1.6

The Human Immunodeficiency Virus . . . . . . . . . . . . . . . . . . . . . .

25

1.7

The HIV-1 Trans-Activator of Transcription . . . . . . . . . . . . . . . . . .

31

1.8

NMR Investigation of the Structure and Dynamics of Tat . . . . . . . . . . .

37

2 Spectral Densities, Relaxation and Dynamics in Nuclear Magnetic Resonance Spectroscopy

39

2.1

Semi-Classical Description of Relaxation . . . . . . . . . . . . . . . . . . . .

40

2.1.1

The Master Equation of Relaxation . . . . . . . . . . . . . . . . . . .

41

2.1.2

The Master Equation in Operator Form . . . . . . . . . . . . . . . .

45

2.1.3

Time Evolution of a Physical Variable . . . . . . . . . . . . . . . . .

53

Relaxation and Dipolar Coupling . . . . . . . . . . . . . . . . . . . . . . . .

56

2.2.1

Unlike Spins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

2.2.2

Longitudinal Relaxation . . . . . . . . . . . . . . . . . . . . . . . . .

61

2.2.3

Transverse Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . .

73

2.2.4

Orientational Spectral Densities and Spherical Harmonics . . . . . . .

80

2.3

Chemical Shift Anisotropy . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

2.4

The Steady-State Heteronuclear Nuclear Overhauser Effect . . . . . . . . . .

98

2.5

Lipari-Szabo Model-Free Formalism . . . . . . . . . . . . . . . . . . . . . . . 103

2.6

Relaxation in the Rotating Frame . . . . . . . . . . . . . . . . . . . . . . . . 107

2.2

3 Materials and Methods

119 vi

3.1

Plasmid construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

3.2

Expression of unlabelled His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . 120

3.3

Expression of

3.4

Purification of His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . . . . . 123

3.5

MALDI-TOF-MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

3.6

NMR Sample Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

3.7

NMR HSQC Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

3.8

NMR Backbone Assignments

3.9

NMR Relaxation Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 129

13

C/15 N-His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . 121

. . . . . . . . . . . . . . . . . . . . . . . . . . 126

3.10 Relaxation Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 3.11 pH and Hydrogen Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 4 Results

143

4.1

Protein Expression and Purification

4.2

Monomer Identification: MALDI-TOF-MS . . . . . . . . . . . . . . . . . . . 144

4.3

NMR Spectroscopy and Resonance Assignments . . . . . . . . . . . . . . . . 145

4.4

Chemical Shifts and 3 JH N H α Coupling Constants . . . . . . . . . . . . . . . 156

4.5

NMR Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

4.6

Spectral Density Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

4.7

Model-Free Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

4.8

pH Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

4.9

Disorder Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 vii

. . . . . . . . . . . . . . . . . . . . . . 143

5 Discussion

213

5.1

Protein Expression and Purification . . . . . . . . . . . . . . . . . . . . . . . 213

5.2

NMR Spectroscopy and Backbone Assignment . . . . . . . . . . . . . . . . . 216

5.3

Chemical Shifts and Coupling Constants . . . . . . . . . . . . . . . . . . . . 217

5.4

NMR Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

5.5

Reduced Spectral Density Mapping . . . . . . . . . . . . . . . . . . . . . . . 222 5.5.1

J(0.87ωH ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

5.5.2

J(ωN ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

5.5.3

Jef f (0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

5.6

Model-Free Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

5.7

pH Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

5.8

Disorder Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

6 Conclusions

239

Bibliography

242

Appendices

264

A Resonance Assignments for His-tagged Tat1−72

264

B Model-Free Parameter Estimates for His-tagged Tat1−72

273

viii

List of Figures 1.1

Variation and classification of levels of protein disorder . . . . . . . . . . . .

7

1.1

continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

1.2

The Protein Trinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.3

Classification scheme for IDP Function . . . . . . . . . . . . . . . . . . . . .

11

1.4

A Protein-Chameleon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

1.5

NMR of the Folded and Unfolded state of drkN SH3 . . . . . . . . . . . . . .

24

1.6

General features of the HIV-1 virion . . . . . . . . . . . . . . . . . . . . . . .

27

1.7

Open reading frames of the HIV genome . . . . . . . . . . . . . . . . . . . .

28

1.8

General features of the HIV life-cycle . . . . . . . . . . . . . . . . . . . . . .

30

1.9

The HIV-1 Tat sequence encoded by exon 1 . . . . . . . . . . . . . . . . . .

33

1.10 The Tat-TAK-TAR association . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.1

Energy level diagram showing transition frequencies for the two spin-1/2 system 59

2.2

Energy level diagram showing transition probabilities for the two spin-1/2 system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

99

2.3

Magnetic field vectors in the rotating frame resulting from a selective spin-lock for a nucleus with Larmor frequency ω0 . . . . . . . . . . . . . . . . . . . . . 110

4.1

Amino acid sequence of His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . 144

4.2

MALDI-TOF-MS identification of monomeric unlabelled His-tagged Tat1−72

4.3

Amide backbone regions of 1 H/15 N-HSQC spectrum for naturally abundant 15

145

N in unlabelled His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . . . 147

4.4

1

H/15 N-HSQC resonance assignments of His-tagged Tat1−72 . . . . . . . . . . 148

4.4

continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

4.5

Intensity profile of 1 H/15 N-HSQC backbone resonances for His-tagged Tat1−72 .151

4.6

Strip plots from HN(CA)CO and HNCACB spectra of His-tagged Tat1−72 . . 153

4.7

1

4.8

Difference plots for His-tagged Tat1−72 chemical shifts and 3 JH N H α coupling

H/15 N-HSQC resonance assignments of His-tagged Tat1−72 . . . . . . . . . . 155

constants from the random coil. . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.8

continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

4.8

continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

4.8

continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

4.9

THRIFTY estimation of the extended disordered state of His-tagged Tat1−72 . 161

4.10 Sample spectra for the steady state heteronuclear 1 H-15 N NOE of His-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 4.10 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 4.11 Relaxation measurements of the His-tagged Tat1−72 protein at pH 4.1 and 293 K, determined at 14.1 T and 18.8 T field strengths. . . . . . . . . . . . . . . 165 x

4.11 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 4.12 Sample spectra for T1 and T1ρ relaxation series for His-tagged Tat1−72 . . . . 168 4.12 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 4.13 Sample fits for T1 of Gly-68 measured at 14.1 T and 18.8 T field strengths . 170 4.13 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 4.14 Sample fits for T1ρ of Gly-68 measured at 14.1 T and 18.8 T field strengths . 172 4.14 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 4.15 Transverse relaxation rates (R2 ) for His-tagged Tat1−72 determined at 14.1 T field strength. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 4.16 Sample spectra for T2 relaxation series for His-tagged Tat1−72 at 14.1 T field strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.17 Sample fit for T2 of Gly-68 measured at 14.1 T field strength . . . . . . . . . 176 4.18 Reduced spectral density mapping of motions for His-tagged Tat1−72 at pH 4.1 and 293 K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 4.18 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 4.19 Field dependent conformational exchange rates for His-tagged Tat1−72 at pH 4.1 and 293 K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.20 Jef f (0) spectral density maps determined for His-tagged Tat1−72 at 14.1 T and 18.8 T field strengths separately and combined. . . . . . . . . . . . . . . 185 4.20 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 4.21 Model-free parameter estimates using Model 2 (Rf = 0.227) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 xi

4.21 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 4.22 Model-free parameter estimates using Model 3 (Rf = 0.136) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 4.22 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 4.23 Model-free parameter estimates using Model 7 (Rf = 0.098) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 4.23 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 4.24 Variation in chemical shift and intensity of 1 H/15 N-HSQC with increasing pH 198 4.25 Predicted amide hydrogen exchange rates for His-tagged Tat1−72 . . . . . . . 199 4.25 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 4.25 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 4.26 Variation in absolute peak heights with increasing pH for observed glycine residues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 4.26 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 4.26 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 4.27 Variation in absolute peak heights with increasing pH for selected serine and threonine residues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 4.27 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 4.27 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 4.28 Decrease in calculated net charge with increasing pH for His-tagged Tat1−72 . 208

xii

4.29 DisProt disorder predictions of amino acid sequence for His-tagged Tat1−72 . . 210 4.29 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 4.30 PONDR disorder predictions for the His-tagged Tat1−72 amino acid sequence. 211 4.31 RONN disorder predictions for the His-tagged Tat1−72 amino acid sequence. . 212 4.32 IUPred disorder predictions for the His-tagged Tat1−72 amino acid sequence. 5.1

212

Variation in the theoretical relaxation rates and steady-state heteronuclear NOE with overall rotational correlation time. . . . . . . . . . . . . . . . . . . 224

B.1 Single Field model-free parameter estimates and Monte Carlo error estimates using Model 2 with relaxation data collected at 600 MHz field strength. . . . 274 B.1 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 B.2 Single Field model-free parameter estimates and Monte Carlo error estimates using Model 2 with relaxation data collected at 18.8 T field strength. . . . . 276 B.2 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

xiii

List of Tables 1.1

Intrinsically disordered proteins (IDPs) and their functions

. . . . . . . . .

12

1.2

Moonlighting Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

1.3

Predictors of protein disorder . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.1

Tensor Operators for the Dipolar Interaction . . . . . . . . . . . . . . . . . .

59

2.2

Commutation Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

2.3

Tensor Operators for the CSA Interaction . . . . . . . . . . . . . . . . . . .

90

3.1

M9 Minimal Medium ingredients . . . . . . . . . . . . . . . . . . . . . . . . 122

3.2

Protein purification buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

3.3

Acquisition parameters for the NMR experiments. . . . . . . . . . . . . . . . 127

3.4

Models tested using Lipari-Szabo and Cole-Cole model-free methods . . . . . 140

4.1

Range and average values for the reduced spectral density mapping of Histagged Tat1−72 at pH 4.1 and 293 K . . . . . . . . . . . . . . . . . . . . . . . 181

4.2

R-factors and mean Akaike Information Criterion values for model-free estimates of dynamics parameters. . . . . . . . . . . . . . . . . . . . . . . . . 188

xiv

A.1 Resonance assignments of Histidine-tagged Tat1−72 . . . . . . . . . . . . . . 264 A.2 Additional assignments of resonances from 1 H/15 N-HSQC of 13 C/15 N labelled Histidine-tagged Tat1−72 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

xv

List of Copyrighted Material The following material has been reproduced or adapted with permission of the copyright holder or author: • Figure 1.1 on page 7: adapted from Journal of Molecular Recognition, 18(5):343–384, V. N. Uversky, C. J. Oldfield, and A. K. Dunker, “Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling”, Figure 1. Different levels of order and disorder, Copyright (2005) with permission of V. N. Uversky. • Figure 1.3 on page 11: adapted from FEBS Letters, 579(15):3346–3354, P. Tompa, “The interplay between structure and function in intrinsically unstructured proteins”, Figure 1. Functional classification scheme of IUPs, Copyright (2005) with permission of P. Tompa. • Table 1.1 on page 12: adapted from Trends in Biochemical Sciences, 27(10):527–533, P. Tompa, “Intrinsically unstructured proteins”, Table 1. Intrinsically unstructured proteins (IUPs) and domains, Copyright (2002); FEBS Letters, 579(15):3346–3354, P. Tompa, “The interplay between structure and function in intrinsically unstructured proteins”, Table 1. Functional classification of IUPs, Copyright (2005) with permission of P. Tompa. • Table 1.2 on page 16: adapted from Trends in Biochemical Sciences, 30(9):484–489, P. xvi

Tompa, “Structural disorder throws new light on moonlighting”, Table 1. Examples of disordered moonlighting proteins, Copyright (2005) with permission of P. Tompa. • Figure 1.4 on page 17: reproduced from Journal of Molecular Recognition, 18(5):343– 384, V. N. Uversky, C. J. Oldfield, and A. K. Dunker, “Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling”, Figure 13. Being a protein-chameleon, Copyright (2005) with permission of V. N. Uversky. • Table 1.3 on page 19: reproduced from Journal of Molecular Recognition, 18(5):343– 384, V. N. Uversky, C. J. Oldfield, and A. K. Dunker, “Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling”, Table 1. Protein disorder predictors, Copyright (2005) with permission of V. N. Uversky. • Figure 1.5 on page 24: reproduced from Biochemistry, 36(9):2390–2402, N. A. Farrow and O. Zhang and J. D. Forman-Kay and L. E. Kay, “Characterization of the backbone dynamics of folded and denatured states of an SH3 domain”, Supplementary Figure S1. Copyright (1997) with permission of L. E. Kay. • Figure 1.10 on page 35: reproduced from J. Mol. Biol., 293(2) pp.235–254, J. Karn, “Tackling Tat”, Figure 4. Recognition of Tar RNA by Tat and TAK, Copyright (1999), with permission from Elsevier and J. Karn.

xvii

Acknowledgments This work was supported by grants from the Natural Sciences and Engineering Research Council of Canada and the University of Manitoba; it was initiated with funding from the Medical Research Council of Canada and the Manitoba Health Research Council. Funding for the 600 MHz spectrometer at the University of Manitoba was made possible by the Canada Foundation for Innovation. I would like to thank the following people for making this research and thesis possible: • Joe O’Neil for giving me the opportunity to work on this project, advice, guidance, patience, support, humour, and kindness; • James Peeling, Scott Kroeker, and Frank Hruska for patience as my committee throughout this long process; • Anthony S. Secco, Hyman D. Gesser, and Arthur Chow for giving me my first experiences at research at the University of Manitoba in my undergraduate and graduate career; • Leo Spyracopoulos (University of Alberta) for many helpful discussions, introducing me to NMR and structural biology, giving me the direction I needed, and for providing the Mathematica notebooks for the single field spectral density and Lipari-Szabo calculations from which all of my subsequent notebooks were based; xviii

• Kaveh Shojania for a lifetime of support, encouragement and advice (some of which I followed); • Ted Schaefer for giving me an idea of what it was all about; • Vincent C. Chen and Hélène Perreault (University of Manitoba) for mass spectral data collection and analysis; • Gillian D. Henry (Tufts University) for all of her work in constructing the Tat expression plasmid; • Kirk Marat for assistance and training at the NMR facility at the University of Manitoba; • Ryan McKay for the acquisition of NMR data on the 800 MHz spectrometer at NANUC (University of Alberta) and for helpful advice on data acquisition; • Frank Delaglio (NIH) for use of some of his unreleased NMRPrime scripts to help in the assignment of the Tat protein; • Lucio Frydman (Weizmann Institute of Science), Pei Zhou and Brian Coggins (Duke University), Thomas Szyperski (SUNY, Buffalo), Ray Freeman (Jesus College, Cambridge) and Eriks Kup˘ce (Varian Inc.) for providing many pulse sequences that I attempted to try to overcome acquisition problems; • Markus Heller (University of British Columbia) for many helpful discussions on NMR and his assistance and advice in preparing this thesis; • the LATEX community; • Lawrence MacIntosh of the Biochemistry Department at the University of British Columbia for providing me with an office and the support of his research group during the writing of this thesis; xix

• Walter Englander (University of Pennsylvania) for providing the Excel files for calculating hydrogen exchange rates; • Richard Sparling of the Department of Microbiology, University of Manitoba and the members of his lab for the use of their glove bag and degassing equipment; • Mark Berjanski (University of Alberta) for verifying some of my conclusions; • Thach N. Vo for finally running the gels that I would never do; • Julian Saba (University of Montreal) for providing me with the motivational words of wisdom that got me through the last few years; • Jamie Galka for allowing me someone to vent with and for always providing me with a good laugh; • the students, faculty and staff of the Department of Chemistry of the University of Manitoba that I have had a chance to know; • The University of Manitoba and the Faculty of Graduate Studies for funding; • my parents for love and support all of my life; • Pamela, Evan and Elliot for their patience, love and support.

xx

Abbreviations 4EPB

eukaryotic translation initiation factor 4E binding protein

6×His

hexahistidine

AIC

Akaike information criterion

AIDS

acquired immunodeficiency syndrome

BBB

blood-brain barrier

βME

β-mercaptoethanol

BSA

bovine serum albumin

CA

capsid protein

CC

Cole-Cole

CD

circular dichroism

CDK9

cyclin-dependent kinase 9

CFTR

cystic fibrosis transmembrane conductance regulator

cm

centimetre

CNS

central nervous system

CPMG

Carr-Purcell-Meiboom-Gill

CREB

cAMP response element binding protein

CSA

chemical shift anisotropy

CSI

chemical shift index

CTD

carboxy terminal domain xxi

CaMKIV

Ca2+ /calmodulin-dependent protein kinase IV

Cdk

cyclin-dependent kinase

Da

dalton

DD

dipole-dipole

deg

degrees

DHPR

dihydro-pyridine receptor

dmol

decimole

DNA

deoxyribonucleic acid

DNase I

deoxyribonuclease I

drkN SH3

N-terminal SH3 domain from the adapter protein drk

Dsp

desiccation stress protein

DSS

2,2-dimethyl-2-silapentane-5-sulfonate

DTT

dithiothreitol

E. coli

Escherichia coli

EBD

entropic bristle domain

EBV-SM

Epstein-Barr Virus nuclear protein BS-MLF1

EMBL

European Molecular Biology Laboratory

FG

Phenylalanine-Glycine

FlgM

flagellar anti-σ factor

g

gram

g

acceleration due to gravity

Gag

group-specific antigen

Gdn-HCl

guanidine hydrogen chloride

gp120

glycoprotein 120

gp41

glycoprotein 41

HAD

HIV-associated dementia xxii

HAT

histone acetyl transferase

His-tag

hexahistidine affinity tag

HIV

human immunodeficiency virus

HIVE

HIV-associated encephalitis

HSQC

heteronuclear single quantum coherence

Hexim1

hexamethylene bisacetamide-inducible protein 1

HX

hydrogen exchange

Hz

hertz

IDP

intrinsically disordered protein

ILK

integrin-linked kinase

IN

integrase

INEPT

insensitive nuclei enhanced by polarization transfer

IPTG

isopropyl-β-D-thiogalactopyranoside

J

joule

K

kelvin

kDa

kilodalton

kHz

kilohertz

kV

kilovolt

L

litre

LS

Lipari-Szabo

LS(ext)

Lipari-Szabo extended

LTR

long terminal repeat

M

molar

MA

matrix complex

MALDI

matrix-assisted laser desorption-ionization

MAP2

microtubule-associated protein 2 xxiii

MARK

microtubule-affinity regulating kinase

MDM2

mouse double minute 2

MDa

megadalton

MES

2-(N-morpholino)ethanesulfonic acid

MHC

major histocompatibility complex

MHz

megahertz

mL

millilitre

mM

millimolar

mRNA

messenger RNA

mg

milligram

ms

millisecond

MS

mass-spectrometry

µg

microgram

µL

microlitre

µM

micromolar

µs

microsecond

MW

molecular weight

m/z

mass-to-charge ratio

N-TEF

negative transcription elongation factor

NACP

non-A beta component of Alzheimer’s disease amyloid plaque

NC

nucleocapsid protein

NCBI

National Center for Biotechnology Information

Nef

negative factor

NIAID

National Institute of Allergy and Infectious Disease

NIH

National institute of Health

nM

nanomolar xxiv

NMR

nuclear magnetic resonance

NOE

nuclear Overhauser effect

NPCs

nuclear pore complexes

ns

nanosecond

Nup

nuclear porin

P-TEFb

positive transcription elongation factor b

PCAF

p300/CBP-associated factor

PCR

polymerase chain reaction

PDB

Protein Data Bank

PEVK

Proline, Glutamate, Valine and Lysine rich region

pI

isoelectric point

PIAS1

protein inhibitor of activated STAT1

PKA

cAMP-dependent protein kinase

PP1

protein phosphatase 1

ppm

parts-per-million

PR

protease

PRP

proline-rich protein

ps

picosecond

RGD

Arginine-Glycine-Aspartate

RNA

ribonucleic acid

RNAPII

RNA Polymerase II

RNase I

ribonuclease I

RPA

replication protein A

RT

reverse transcriptase

Rev

regular expression of virus

s

second xxv

SAXS

small angle X-ray scattering

SD

standard deviation

SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis SNAP-25

syntaxin and synaptosomal-associated protein of 25 kDa

SPE

solid phase extraction

STAT

signal transducer and activator of transcription

SU

surface unit complex

SW

sweep width

T

temperature

T

tesla

TAD

transactivator domain

TAF

TATA-box associated factors

TAK

Tat-associated kinases

TAR

trans-activation response

TB

Terrific Broth

TCEP

tris(2-carboxyethyl) phosphine

TCP

tris(2-cyanoethyl)phosphine

TFA

trifluoro-acetic acid

THP

tris(hydroxypropyl)phosphine

TM

transmembrane complex

TOF

time-of-flight

Tat

trans-activator of transcription

Tris-HCl

tris(hydroxymethyl) aminomethane hydrochloride

UCU

uridine-cytidine-uridine

UV

ultraviolet

Vif

viral infectivity factor xxvi

Vpr

viral protein R

Vpu

viral protein U

WH2

Wiskott–Aldrich syndrome protein homology domain 2

xxvii

Chapter 1 Introduction 1.1

Intrinsically Disordered Proteins

For most of the last century, it has generally been accepted that a protein must adopt a well defined tertiary structure to achieve its functional native state, and that proteins and protein domains that lacked secondary structural motifs were without function. The idea that a well defined three-dimensional structure was a prerequisite for protein function came to be referred to as the structure-function paradigm. Over the last 20 years, there has been increasing evidence of proteins that exist in partially folded, unfolded and molten globule states, that have functional importance. These observations—along with the increasing numbers of proteins that are being discovered to be intrinsically disordered or partially folded in proteomics and bioinformatics research—has led to a re-assessment of the structurefunction paradigm [1]. In this section a brief introduction to the origin of the structure-function paradigm, along with some of the arguments for its re-assessment, will be presented. Examples of key cellular processes illustrating the functional importance of the disordered state will be

1

described along with the functional classification of disordered proteins.

1.1.1

The Origin of the Structure-Function Paradigm

The origin of the structure-function paradigm is not clearly understood. However, a detailed review of the literature by Dunker et al. [2] outlines some of the pivotal work that led to the development of the structure-function idea. • Schloss und Schlüssel (lock and key): Emil Fischer (1894) observed that extracellular extracts of beer yeast, containing invertase, hydrolyzed α-glucosides but not βglucosides, while emuslin hydrolyzed the β’s but not the α’s [3]. The translation (by Lemieux and Spohr) [4] of the conclusion to these observations stated: “To use an image, I would say that the enzyme and glucoside have to fit each other like a lock and key in order to exert a chemical effect on each other”. • Hsien Wu (1931) hypothesized that denaturation corresponded to protein unfolding— as opposed to chemical alteration of the protein. Wu proposed that denaturation involved a transition from a compact ordered structure to a more flexible disordered structure and resulted in exposure of the amino acid side-chains to the solvent [2, 5]. Wu’s work was not well known at the time but its importance was realized later following the independent work of Mirsky and Pauling [6]. • Mirsky and Pauling (1936) published a survey on the structure of the native, denatured and coagulated states of proteins. Their review compiled the following observations [6]: loss of pepsin activity correlated with the amount of protein denatured; acid, alkali and urea all increased the viscosity of protein solutions and denatured the proteins without aggregation; many native proteins form crystals while denatured proteins

2

do not crystallize; exposure of sulfhydryls and other side-chain groups is typically accompanied by denaturation. “The characteristic specific properties of native proteins we attribute to their uniquely defined configurations. The denatured protein molecule we consider to be characterized by the absence of a uniquely defined configuration.” “It is evident that with loss of the uniquely defined configuration there would be loss of the specific properties of the native protein”. [6] The following decades provided numerous studies that identified loss of function upon denaturation of proteins and formed the foundation for the view that ordered structure is a necessary condition for protein function. The identification of the α-helix and β-strand in 1951 by Pauling et al. [7–9] provided the structural units that were attributed to biological activity. In 1959, Kauzmann published an extensive review on protein denaturation [10,11] in which the idea of the hydrophobic effect as the governing force in protein folding was outlined. This idea of hydrophobicity took time to take root, but inevitably became widely used as the explanation for structural containment and biological function [11]. By the 1960’s, when the atomic resolution structures of myoglobin [12] and lysozyme [13] had been determined, it was already generally accepted that a necessary condition for protein function was a specific folded 3D structure [2]. The disruption of hydrogen bonds in denatured proteins would result in a loss of function. However, studies such as these say nothing about proteins that lacked well folded structure in the absence of denaturant. The above studies of protein structure, denaturation and relationships to function may have played a key role in the development of structure-function paradigm, but perhaps also influential was the discovery of the double-helix as the structure of DNA [14, 15]. Watson and Crick’s 1953 papers proposing a structural model for DNA established the basis for the transfer of genetic information which later became the central dogma of molecular biology. 3

To have such a complex set of biological phenomena explained by a structural model must have had great influence on the biological community. The more than 42,000 protein structures now available, have obscured alternatives to the structure-function paradigm. However, most of these structures are similar to each other (only 1028 unique folds) and have recognisable sequence similarity to only a small fraction of the proteins in nature [2].

1.1.2

Discrepancies in the Structure-Function Paradigm

Karush reported in 1950 [2, 16] that, contrary to the behaviour of every other native protein known at the time, serum albumin demonstrated an outstanding capacity for the formation of reversible, high-affinity complexes with a variety of ions and molecules of diverse configurations.

With arguments similar to Fischer’s [2, 3] for the lock-and-key,

Karush inferred that the binding sites of albumin assume a large number of configurations in equilibrium with each other and of similar energy. In the presence of an anion, the configuration adopted is the one that is stabilized through the specific interactions with the present anion (allowing the anion to interact with appropriate residues of the polypeptide). In other words, upon interaction with the anion, the best configuration is adopted from albumin’s structural ensemble. Karush referred to this phenomenon as ‘configurational adaptability’. In 1958, Koshland independently proposed a concept very similar to configurational adaptability, which was later called the ‘induced-fit’ theory [2, 17]. In his examination of enzyme reactivity and specificity, Koshland [17] postulated that: “(a) a precise orientation of catalytic groups is required for enzyme action; (b) the substrate may cause an appreciable change in the three-dimensional relationship of the amino acids at the active site; and (c) the changes in protein structure 4

caused by a substrate will bring the catalytic groups into the proper orientation for reaction, whereas a non-substrate will not”. However, Koshland did not propose a mechanism for ‘induced-fit’ and left in question whether binding induced a conformation to be adopted or was the conformation selected as the best-fit from an ensemble of structures in equilibrium. In 1978, Bennett and Steitz reported evidence of significant domain movement with glucose-induced conformational changes in yeast hexokinase [2,18]. In this study, the authors proposed two possible functional roles for the flexibility and conformational change: as an “embracing” mechanism to surround the substrate, or as a discriminating mechanism against water as a substrate [18].

1.1.3

Discovery of Intrinsic Disorder

According to the Protein Data Bank (PDB) [19] there were, by 1978, only 41 protein structures known. At least two of these structures showed that certain segments of a protein, known to be essential for the function, yielded no apparent electron density [2, 20, 21]. The absence of electron density in protein structures can be the result of: failure to solve the phase problem, crystal defects, or proteolytic degradation during purification. However, the most common reason for missing electron density is that the unobserved atom, side-chain, residue or region fails to scatter X-rays coherently due to differences in atomic position (disorder) from one protein to the next in the crystal [2]. Also in 1978, Aviles et al., using nuclear magnetic resonance (NMR), noted disorder in the highly charged functional tail of histone H5 [22]. NMR later revealed (by the 1990’s) functional proteins that lacked any identifiable structure (disordered from end-to-end) [23– 25]. Unlike X-ray diffraction where the absence of electron density may indicate disorder, NMR evidence for disorder is observable through chemical shift dispersion, peak widths, 5

relaxation times and heteronuclear nuclear Overhauser effects (NOEs) [26, 27].

1.2

Classifications of Disorder and the Protein Trinity

Before introducing the classification scheme for protein disorder, it is prudent to introduce and clarify the terminology. Historically, the term ‘natively denatured’ [28] was introduced to distinguish extremely flexible proteins from normal globular proteins. Slightly earlier, the term ‘rheomorphic’ was used as an alternative to the term random coil [29, 30] which originated from the study of polymers. The term ‘natively unfolded’ [31] was introduced to describe a non-compact protein state that lacked secondary structure under physiological conditions. More recently, the term ‘intrinsically unstructured’ [1] has been used frequently to describe proteins and protein domains that lack secondary structure. Uversky et al. [29] outline the various combinations of these terms to describe proteins that do not possess a rigid 3D structure. The authors proposed the term ‘intrinsically disordered’ as a means of acknowledging the fact that a disordered protein is not without structure, but it exists as an ensemble of interconverting structures [29]; neither is it ‘unfolded’ since this term implies that under some set physiological conditions or circumstances the protein would fold, which is not necessarily true. Uversky et al. [29], in addition to proposing the appropriate language for describing disordered proteins, propose a classification scheme of the various levels of disorder, found in proteins and outlined in Figure 1.1. This classification scheme also accounts for the ‘molten globule’ state of proteins (partially disordered but still collapsed) proposed by Ohgushi and Wada [32].

6

(a)

(b)

(c)

(d)

(e)

(f)

Figure 1.1: Variation and classification of levels of protein disorder: (a) no disorder; (b) disordered termini; (c) disordered linker; (d) disordered loop; (e) disordered domain; (f) disordered protein with some residual structure; (g) wholly disordered, mostly collapsed protein; and (h) wholly disordered, extended protein. Adapted from [29] with permission of V. N. Uversky. 7

(g)

(h)

Figure 1.1: continued The identification of the molten globule state of proteins [32] resulted in the reworking of the two-state model of protein folding [33–35]. Although it took some time to be generally accepted, it has come to be understood that not all globular proteins undergo a cooperative transition from the unfolded state (U) to the folded or native state (N) without any stable intermediates. Ptitsyn and Uversky [36] proposed that proteins along the protein folding landscape may exist in stable molten globule states. In a subsequent study of β-lactamase, Ptitsyn and Uversky [37] identified a fourth state which they termed ‘partially folded’ although it has since become known as the pre-molten globule state [38] because it is less compact than the molten globule state, but more compact than the completely unfolded state. The identification of the molten globule as a thermodynamically stable state [36] led to the re-evaluation of the structure-function paradigm which came from Dunker and Obradovic in their proposal of the protein trinity hypothesis (Figure 1.2) [39]. According to Dunker and Obradovic, native proteins can be in one of three states: the ordered (folded) state, the liquid-like collapsed-disordered state (molten globule), and the extended-disordered state (random coil). Function may arise in any one of these three states or from transitions between the states [2, 39]. Uversky later extended this hypothesis to include the so-called pre-molten globule state, which he referred to as the protein quartet [40].

8

Ordered (folded)

Collapsed (molten globule)

Extended (random coil)

Figure 1.2: The Protein Trinity of native and functional states of proteins [2, 39]. Proteins may exist and function in the ordered, collapsed-disordered, or extended-disordered states as well as in transitions between these states.

1.3

Intrinsic Disorder and Function

A literature review of intrinsically disordered proteins (IDPs) and of disordered protein domains with identified function listed 90 proteins that were identified to be involved in 28 distinct functions [41]. It has been suggested that these functional roles can be divided into six broad categories as follows [42]: entropic chains, effectors, scavengers, assemblers, chaperones and display sites [42–44]. In all of the categories except for the entropic chains, interactions between the disordered segment and target typically result in some degree of disorder-to-order transitions [42]. Entropic chains are a unique category of IDPs that emphasizes the contradiction of the old structure-function paradigm. The function of the entropic chains requires disorder. These entropic chains can be categorized as linkers/spacers, bristles, and brushes. In all cases, the disordered segment remains disordered as it functions [2, 41–46]. The entropic chains can serve multiple purposes. In some cases, the disordered protein segment serves as linker or spacer between ordered domains in a multi-domain protein. The linker/spacer then serves to regulate the distance between the adjacent domains and enables

9

conformational freedom in orientational searches [42,43]. An example of this type of entropic chain is found in replication protein A (RPA) where the N-terminal 108 residues form a fivestranded β-barrel capped by two small helices, followed by a 60-residue flexible linker to its DNA-binding domain [47]. The entropic bristles, or brushes, operate via steric repulsion or excluded volume effects. In this manner entropic bristle domains (EBD) can regulate pores, channels, or active sites by rapidly adopting many conformations and can restrict the entrance until the EBD has been modified (eg., phosphorylation) [45]. An example of this sort of gating is found in nuclear pore complexes (NPCs) by nucleoporins (Nups) with large phenylalanine-glycine (FG) repeats [48]. In the case of the entropic brush, the excluded-volume principle operates in a similar manner to control the spacing of larger proteins or complexes. For example, neurofilament separation is regulated by EBD sidearms along the core filament. The thermally driven motions of the EBDs will give each filament a much larger effective volume [49]. The spacing of the filaments in bundling is therefore regulated by these thermally driven motions, which can also be regulated by the amount of phosphorylation of the EBDs [50]. The effectors, scavengers, assemblers, chaperones and display-sites classes of protein are classified according to their degree of disorder-to-order and binding interaction. Figure 1.3 shows how the intrinsically disordered proteins are separated into the appropriate functional class based according to whether the protein continues to freely move through its conformational space (no disorder-to-order transition) or whether it undergoes some disorder-to-order transition upon binding [43]. Those proteins involved in target binding are further separated depending on whether the interaction is permanent or transient [43]. Table 1.1, adapted from [42] and [43], contains a limited set of examples of each class of IDP.

10

IDP

entropic chain directly function due to disorder as a spring, bristle, or linker

molecular recognition

transient binding

display-sites sites of posttranslational modification

chaperones assist the folding of RNA or protein

permanent binding

effectors modulate the activity of a partner molecule

assemblers assemble complexes or target activity

scavengers store and/or neutralize small ligands

Figure 1.3: IDP functional classification relates directly to their ability to move freely through a large conformational space (entropic chains), or to the lifetime of binding to their target. Adapted from [43] with permission of P. Tompa.

11

Table 1.1: Examples of intrinsically disordered proteins (IDPs) and domains with target/partner (if applicable) and function3 . Adapted from [42] and [43] with permission of P. Tompa. IDP (protein/domain)

Target/partner

Function/action

Entropic chains Microtubule-associated Not applicable

Entropic bristle (spacing

protein 2 (MAP2)

in microtubule architecture)

projection domain Titin PEVK domain

Not applicable

Entropic spring (passive contractile force in muscle)

SNAP-25 linker region

Not applicable

Flexible spacer/linker of binding domains

Effectors Calpastatin

p21/27

Ca2+ -activated

Inhibitor of calpain in

protease (calpain)

Ca2+ signalling

Cyclin-dependent kinases

Kip/Cip class inhibitors in cell cycle regulation

4EBP1, 2, 3

Eucaryotic translation

Inhibitor of translation

initiation factor (eIF4E)

initiation Continued on next page

3

Abbreviations: PEVK, Pro, Glu, Val and Lys rich region; SNAP-25, syntaxin and synaptosomal-associated protein of 25 kDa; 4EPB, eukaryotic translation initiation factor 4E binding protein; CREB, cAMP response element binding protein; PKA, cAMPdependent protein kinase; CaMKIV, Ca2+ /calmodulin-dependent protein kinase IV; MARK, microtubule-affinity regulating kinase; NACP, non-A beta component of Alzheimer’s disease amyloid plaque. 12

Table 1.1 – continued from previous page IDP (protein/domain) Securin

Target/partner Separase

Function/action Inhibitor of chromosome separation before anaphase in mitosis

FlgM

Sigma 28 transcription Inhibitor of flagellin-specific factor

gene expression in bacteria

Tubulin dimers

Microtubule disassembly, catastrophe

Thymosins (proTα)

Zn2+ , histone

Not reported

Caseins

Calcium phosphate

Nanocluster formation,

Stathmin Scavengers

inhibition of precipitation in milk Salivary proline-rich

Tannin

Binding/neutralization of

protein (PRP) Desiccation stress

polyphenolic plant compounds Water

Retention of water to prevent

protein (Dsp) 16

desiccation of plants

Assemblers MAP2 microtubule-

Tubulin dimers

Microtubule polymerization,

binding domain

bundling Continued on next page

13

Table 1.1 – continued from previous page IDP (protein/domain) Caldesmon

Target/partner

Function/action

Ca2+ calmodulin, F-actin,

Actin polymerization,

myosin, tropomyosin

bundling

Oct1 transcription factor,

B-cell-specific expression of

Igκ promoter, TAFII105

immunoglobulin genes

λ phage N protein

mRNA, NusA, RNA Pol II

Translation anti-termination

SIBLING proteins

Integrin, complement

Assembly of bone

factor H, CD44, fibronectin

extracellular matrix

Fibronectin

Adherence to extracellular

Bob1

Fibronectin receptor (MSCRAMM) D1-D4

matrix of host in bacterial invasion

CREB transactivator

TATA-box-associated

Assembly of transcription

domain (TAD)

factors (TAFs), CREB-binding preinitiation complex protein

Display sites CREB TAD

Protein kinases

Regulation by

(e.g. PKA, CaMKIV)

phosphorylation

MAP2 microtubule-

Protein kinases

Regulation by

binding domain

(e.g. PKA, MARK)

phosphorylation Continued on next page

14

Table 1.1 – concluded from previous page IDP (protein/domain) Bcl-2 antiapoptotic

Target/partner

Function/action

Proteases (e.g. caspase) In vivo proteolysis site

protein (24–93)

Chaperones α-Synuclein (NACP)

Protein chaperone

Casein

Protein chaperone

Nucleocapsid protein 7/9

RNA chaperone

Ribosomal S12

RNA chaperone

Prion protein N-terminal domain

RNA chaperone

One advantage of the non-entropic chain IDPs is their ability to bind multiple partners and to have multiple functions. This binding ‘promiscuity’, which can modulate the activity of different targets, has been observed for several IDPs [2,51,52]. These proteins, commonly referred to as moonlighting proteins, have also been found to have opposing effects on the same target [51]. Some examples of moonlighting proteins are listed in Table 1.2 from [51].

15

Table 1.2: A selection of disordered moonlighting proteins with known opposing function. Adapted from [51] with permission of P. Tompa. Proteina

One (inhibiting) function

Another (activating) function

Calpastatin

Inhibition of calpain

Activation of calpain

CFTR (R domain)

Inhibition of CFTR

Activation of CFTR

DHPR (peptide C)

Inhibition of RyR

Activation of RyR

EBV-SM

Down-regulation of

Up-regulation of

intron-containing mRNA

intron-less mRNA

MDM2 (180–298)

Down-regulation of p21Cip1

Activation of estrogen receptor α

p21Cip1 and p27Kip1

Inhibition of Cdk

Activation of Cdk

PIAS1 (392–541)

Inhibition of activated STAT

Activation of p53

I-2

Inhibition of PP1

Activation of PP1

Ribosomal L5

Inhibition of MDM2 ubiquitin ligase Activation (chaperoning) of ribosome

Securin

Inhibition of separase

Activation (chaperoning) of separase

Thymosin-b4

Sequestration of G-actin

Activation of actin polymerization,

(WH2 domain) a

ILK kinase

Abbreviations: CFTR, cystic fibrosis transmembrane conductance regulator; DHPR, dihydro-pyridine receptor; EBV-SM, Epstein-Barr Virus nuclear protein BS-MLF1; MDM2, mouse double minute 2; Cdk, cyclin-dependent kinase; STAT, signal transducer and activator of transcription; PIAS1, protein inhibitor of activated STAT1; PP1, protein phosphatase 1; WH2, Wiskott–Aldrich syndrome protein homology domain 2; ILK, integrin-linked kinase.

1.3.1

Protein-Chameleon

An interesting example of the multiple functional roles of disordered proteins is the presynaptic protein α-synuclein, whose aggregation and fibrillation are implicated in the development of Parkinson’s disease [29, 53].

α-Synuclein can adopt several completely

different structures depending on its environment. Its conformational plasticity allows it 16

to be substantially disordered, adopt a partly folded (amyloidogenic) conformation, fold into either α-helical or β-sheet species (both monomeric and oligomeric), and form aggregates with several different morphologies (spheres, doughnuts, amorphous, or amyloid-like fibrils) 370

[29]. Figure 1.4, taken from [29], illustrates the many forms of this unusual protein. V. N. UVERSKY ET AL.

Figure 13. Being a protein-chameleon, "-synuclein is able to adopt absolutely different conformations in a template-dependent manner (modified from Uversky, 2003b).

Figure 1.4: A protein-chameleon, α-synuclein is able to adopt several completely different structures depending on its environment. Reproduced from [29] with permission of V. N. represented in a form of chameleon with different potential on the crystal structure of the protein (Dajani et al., 2001), Uversky. monomeric, oligomeric and insoluble conformations drawn around it.

where the N-terminus of GSK3! was shown to be converted to an autoinhibitory pseudo-substrate via the phosphorylation of Ser9 (Dajani et al., 2001; Frame et al., 2001). Remember that the non-phosphorylated N-terminal fragKilling two birds with one stone ment of GSK3! preceding Lys35 was shown to be disordered in the crystal (Dajani et al., 2001). The structural plasticity of ID proteins in the non-bound Thus, it has been pointed out that GSK3! achieves the state enables them to interact with multiple, structurally clever trick of transducing signals for two completely distinct partners, giving rise to structural polymorphism in independent pathways without any obvious cross-talk or A state. number algorithms have been developed interference in recent years theIndisordered the bound This of capability has functional implications, (Dajanitoetpredict al., 2003). the Wnt signaling since one ID protein can serve several different signaling network, a subset of the cellular GSK3! pool is involved segments on amino acid properties as charge, hydropathy, networks and can of be proteins regulated based via several different pathin a (such multiprotein complex that bringssecondary GSK3! and its !ways. Let us further consider GSK3! as a prototypical catenin substrate into close proximity. In the insulin signalexample of this concept. Besides its crucial role in the ing pathway, GSK3! operates via a completely different Wnt signaling pathway controlling the levels of !-catenin 17 mechanism, where the phosphorylation of Ser9 converts the as discussed above, GSK3! is also involved in the insulin disordered N-terminus of GSK3! to an autoinhibitory segand growth factor signaling pathways. In insulin signaling, ment, which blocks access to the active site and/or substrate

1.4

Disorder Prediction

structure propensity, and flexibility index) and their frequency of occurrence throughout the protein [54–61]. Low-sequence-complexity is often an indicator of protein disorder (i.e., low variability of the 20 amino acids within a segment of the protein and repetition of amino acids) [29]. Disordered proteins often exhibit a compositional bias against bulky or nonpolar amino acids (i.e., low content of Val, Leu, Ile, Met, Phe, Trp, and Tyr) [62]. Because high content of polar or charged residues tends to favour disorder, a higher proportion of Gln, Ser, Pro, Glu and Lys are usually observed [29]. Gly and Ala are often found to be present in higher proportions because their small side chains favour flexibility [62]. Table 1.3 lists several of these algorithms and their sites for web-server disorder predictions, some of which have been used to analyze protein sequence databases of entire genomes. Sequence analysis using the DISOPRED2 algorithm showed that disordered segments of 30 or more consecutive residues occur in 2.0% of archaean, 4.2% of eubacterial and 33.0% of eukaryotic proteins [63].

18

Table 1.3: Predictors of protein disorder. Adapted from [29] with permission from V. N. Uversky. Predictor

Web address

Reference

Charge-hydropathy plot

www.pondr.com/

[64]

DisEMBL

http://dis.embl.de

[65]

DISOPRED

http://bioinf.cs.ucl.ac.uk/disopred/

[63, 66, 67]

DISpro

www.ics.uci.edu/∼baldig/diso.html

no literature ref.

DRIPpred

http://sbcweb.pdc.kth.se/cgi-bin/maccallr/

no literature ref.a

disorder/submit.pl FoldIndex©

http://bioportal.weizmann.ac.il/fldbin/findex

[68]

GlobPlot

http://globplot.embl.de/

[65, 69]

IUPred

http://iupred.enzim.hu/

[54, 55]

NORSp

http://cubic.bioc.columbia.edu/services/NORSp/ [70]

PONDR®b

www.pondr.com/

[39, 56–58, 71]

PreLink

http://genomics.eu.org/

[72]

RONN

www.strubi.ox.ac.uk/RONN

[61]

SEGc

http://mendel.imp.univie.ac.at/METHODS/

[73]

seg.server.html/ a

A description of DRIPpred algorithm can be found at www.forcasp.org/paper2127.html

b

PONDR® is a family of ID predictors, which includes VL-XT and VL3

c

SEG is a predictor of low-sequence-complexity regions

1.5

Detecting and Characterizing Disorder

Detecting the presence of disorder in a protein can be achieved by a variety of experimental methods. Some methods are sensitive to the presence of disorder but do not yield any residue-specific information on the location of the disorder. This section will briefly describe

19

some of the more common methods used for detecting and characterising protein disorder. Disorder detection is in no way limited to the methods described herein.

X-ray Crystallography The absence of electron density in single-crystal X-ray diffraction can be the result of failure of a region of the molecule to scatter X-rays coherently due to differences in the atomic positions from one protein to the next in the crystal. This absence of electron density often indicates regions of disorder in the structure. Additional experiments are needed to verify this conclusion. It is possible that a well folded domain could be ‘wobbly’ in that the whole domain moves as a rigid structure, so the domain has different positions from one protein to the next in the crystal [2, 55, 74]. These wobbly domains would also result in an absence of electron density for the whole domain. In addition, the missing electron density could be the result of a crystal defect or proteolysis during purification [2].

Circular Dichroism Spectropolarimetry Circular dichroism (CD) spectropolarimetry provides global structural characterization of proteins in solution. CD is based on the differential absorption of left- and right-circularly polarized light, and is reported in terms of the difference in the electric field vectors ∆E for the left- and right-circularly polarized light (EL and ER respectively) or in degrees of ellipticity (θ) [75]. The ellipticity is defined as the arctangent of the ratio of the difference and the sum of the two electric field vectors (defining the minor and major axis of an ellipse) [75] as tan θ =

∆E EL + ER

20

(1.1)

The ellipticity values are generally normalized and reported in molar ellipticity, [θ], values of deg·cm2 ·dmol−1 . Far-ultraviolet (far-UV) CD (240–180 nm range) is sensitive to the symmetry in the peptide bond environment and is therefore a fast method to characterize the content of secondary structure (α-helix, β-sheet, β-turn, and disordered) in a protein [2, 76]. Far-UV CD can characterize proteins with α-helices by large positive bands at 193 nm and negative bands at 208 and 222 nm; β-sheets and turns by a positive band at 193 nm and a negative band at 218 nm; and disorder (coil) by a negative band at 195 nm and very low ellipticity above 210 nm [75]. In the near-UV region (320–260 nm), CD is sensitive to the environment of the aromatic side-chains (Phe, Tyr, and Trp) and can give tertiary structural information [76]. However, CD in both the near- and far-UV range give only global information (i.e., fraction of protein that is helical, sheet or coil but not which residues are involved in the structured segments).

Protease Digestion Susceptibility to protease degradation is a useful method of determining flexible and exposed regions of proteins [2, 76,77]. Protease attack of peptide bonds can only occur in the flexible regions of the protein over 8 to 10 residues long that are exposed and therefore identify the disordered segments connecting well folded domains or intrinsically disordered domains [76–79]. This method is particularly useful in determining the cause of missing electron density in X-ray structures and discriminating disordered protein segments from wobbly domains [2]. The wobbly domains will remain as ordered structures once cleaved from the whole protein at their flexible linkers, whereas a disordered segment will be further digested into smaller fragments [2]. Proteolysis therefore provides a means of mapping the regions of disorder across the protein sequence and complements other biophysical methods of analysis. 21

Small-Angle X-Ray Scattering Small angle X-ray scattering (SAXS) is a useful method for solution studies of flexible, low-compactness macromolecules in the kDa–MDa range [75, 80]. SAXS provides structural data (at a low resolution) that enables estimation of the size of the molecule via its radius of gyration (Rg ) and its degree of extension via the maximal intramolecular distance parameter, DM AX [81]. SAXS provides overall features (nanometre scale) of the molecule (size, tertiary and quaternary folds) but does not yield atomic resolution details [80]. For dilute solutions of monodispersed non-interacting particles, the scattering of X-rays is given by the intensity relation under the Guinier approximation for small angles [82, 83]. In short, s is the modulus of the scattering vector (the momentum transfer) and is given by

s=

4π sin θ λ

(1.2)

where 2θ is the total scattering angle and λ is the wavelength of the incident X-rays [84, 85]. For small momentum transfer, the scattering intensity is !

Rg2 I(s) ≈ I(0) exp −s 3 2

"

(1.3)

where I(0) denotes the intensity of forward scattering. Thus, for sRg < 1, a plot of the natural logarithm of the scattering intensity against s2 , the square of the modulus of the scattering vector (the Guinier plot) will have a slope of −Rg2 /3 and intercept of ln (I(0)) [86]. Larger values of Rg will indicate disorder or lack of compactness [2]. From the I(0) term, the molecular weight of the molecule can be inferred through the relation I(0) = κc∆ρ2 (M W )2

22

(1.4)

where κ is a constant of proportionality determined from the measurement of a standard sample, c is the concentration, ∆ρ is the average electron density contrast (difference in electron density between the macromolecule and the solvent) and M W is the molecular weight [84]. The parameter DM AX is not obtained in such a straightforward manner and its description is beyond the scope of this thesis.

Briefly, the scattering intensity can be

described by a distribution function, p(r), for the pairwise intramolecular atomic distances r which can be obtained from an indirect Fourier transformation of the scattering profile [85, 86]. In this manner, the scattering intensity is described as

I(s) =

#

DM AX

p(r)

0

sin(sr) dr sr

(1.5)

Plotting s2 I(s) against s, referred to as a Kratky plot, will indicate the degree of compactness in the molecule.

A typical globular protein will exhibit a bell-shaped

distribution, whereas a disordered protein will have no clear maximum [77].

Nuclear Magnetic Resonance Spectroscopy Nuclear magnetic resonance (NMR) spectroscopy is uniquely applicable to the determination of three-dimensional biomolecular structures at the atomic level in solution [87]. NMR is also able to provide structural and dynamic details of flexible and disordered solution state proteins at the atomic level [88]. The 1 H one-dimensional NMR spectrum can indicate disorder through the amount of dispersion of the resonances and the widths of the lines [88]. Poorly dispersed signals in the proton dimension indicate disorder. Isotope enrichment of the protein samples in

15

N and

13

C allows multidimensional experiments to be performed

that take advantage of the increased dispersion in the

23

15

N and

13

C dimensions of two- and

three-dimensional experiments. High magnetic field spectrometers also provide increased dispersion in the signals and, in combination with isotope labelling, can overcome the serious crowding and overlap in resonances [89]. Figure 1.5 shows representative 1 H and 15

N correlation spectra of the N-terminal SH3 domain of the protein drk (drkN SH3) in

the folded and unfolded states [90] and demonstrates the reduction in the dispersion of resonances (particularly the protons) between the folded and unfolded states of the protein.

Figure 1.5:

1

H-15 N correlation spectra from the first time point in a T1 relaxation series

(t=0.011 s) of (a) the folded state of drkN SH3 and (b) the guanidinium chloride unfolded state of the drkN SH3 domain. Spectra were recorded at 14◦ C on a 600 MHz spectrometer. In (b), the peak at 7.3 and 115 ppm corresponds to the signal arising from guanidinium chloride. Reproduced from [90] with permission of L. E. Kay.

One of the main advantages of NMR spectroscopy is the ability to obtain detailed dynamic information of the atomic motions (particularly of the

15

N and

13

C nuclei) over a

range of timescales that span 12 orders of magnitude (picosecond–second) [91]. However, 24

NMR is hampered by the fact that it requires sample concentrations in the 1-2 mM range. This concentration condition can be a serious problem in the study of the disordered state since disordered proteins are prone to aggregation at these concentrations [88].

Thus,

preparing and maintaining a monodisperse sample may be difficult. The characterisation of the disordered states of proteins is not limited to the aforementioned methods, but they are the most common. For a more in-depth review of detecting and characterising disordered proteins, the reader is directed to [77] and [92]. The remaining chapters of this thesis describe NMR relaxation and analysis of the disordered state of the HIV-1 Tat protein.

1.6

The Human Immunodeficiency Virus

Despite some disagreements in naming and credit for discovery, human immunodeficiency virus (HIV) has been understood for more than 20 years to be the causative agent of acquired immunodeficiency syndrome (AIDS) [93–95]. HIV belongs to the lentivirus genus which is a family of retroviruses that are distinguished by their cone-shaped capsid core [96]. Retroviruses are defined by their RNA genomes (single strand) which are reverse-transcribed into DNA and then integrated into the host DNA (the provirus) of the infected cells [97]. The major cells infected and depleted by the HIV are the CD4+ T-lymphocytes which play a critical role in the immune response [98]. The mature HIV virion, depicted in Figure 1.6, is comprised of a lipid bilayer that is derived from its host cell. The lipid bilayer also contains some host cell membrane proteins. The surface of the virion is covered with glycoproteins (gp) which are involved in binding to host cell surface receptors, the most common of which is the immunoglobulinlike protein CD4 [99]. The surface unit (SU) is made of a trimeric complex of glycoprotein

25

120 (gp120) and is attached to the virion through a transmembrane (TM) trimeric complex of glycoprotein 41 (gp41) [100,101]. The transmembrane complex may also aid in the fusion of the virion and host cell initiated through an N-terminal glycine-rich peptide [99]. Beneath the lipid bilayer is the matrix (MA) complex of proteins (p17). The conical capsid is a complex of approximately 1500 capsid (CA) proteins (p24) [102]. The capsid contains the viral genome (two single strands of RNA) along with several viral enzymes necessary for early replication steps [103]. The inner core contains the viral enzyme reverse transcriptase (RT or p51) which processes the viral RNA into viral DNA inside the host cell; integrase (IN or p31) that inserts the viral DNA into the host DNA in the nucleus; nucleocapsid (NC or p9) which functions to deliver unspliced RNA for assembly of new virions; protease (PR) which cleaves viral polyproteins into their functional units; and negative factor (Nef) which is primarily involved in the down regulation of CD4 surface expression (increases the rate of CD4 endocytosis and degradation by lysosomes [104]) from the host cell but it may also serve to enhance the envelope (Env) incorporation into new virions and facilitate budding and release [99].

26

SU TM

gp120

gp41

Nef

PR

CA (p24)

NC (p9)

RT (p51) RNA lipid membrane MA (p17)

IN (p31)

Figure 1.6: The characteristic features of the HIV-1 virion showing the conical capsid comprised of ∼1500 copies of the capsid protein (CA or p24). The core contains the viral diploid single-strand RNA [103], nucleocapsid protein (NC or p9), protease (PR), integrase (IN), negative factor (Nef), integrase (IN or p31) and reverse transcriptase (RT or p51). The capsid core is enclosed in a protein matrix (MA or p17). The matrix is enveloped by a lipid bilayer derived from the host cell along with some host cell proteins. The surface unit (SU) is comprised of trimers of glycoprotein 120 (gp120) which are anchored to the envelope via the transmembrane (TM) complex consisting of a trimer of glycoprotein 41 (gp41).

27

There are nine open reading frames in the HIV-1 genome [99] as depicted in Figure 1.7. The group-specific antigen (gag) gene encodes a polyprotein (Gag) that contains the major structural components of the virus core (matrix, capsid, and nucleocapsid). The pol gene encodes a polyprotein (Pol) containing reverse transcriptase, integrase and protease. Protease (also contained in the mature virion) cleaves the Gag and Gag-Pol polyproteins into the individual protein units [96]. The envelope (env) gene encodes the Env proteins glycoprotein 120 and glycoprotein 41 which make up the surface unit and transmembrane complexes [99]. The six additional reading frames are the genes for the regulatory proteins trans-activator of transcription (Tat) and regular expression of virus (Rev), and the accessory proteins: viral infectivity factor (Vif), viral protein U (Vpu), viral protein R (Vpr), and negative factor (Nef) [96].

vpr rev LTR

gag

env

tat nef

pol vif

tat vpu

rev

LTR

promoter Figure 1.7: Open reading frames of the HIV genome. The HIV long terminal repeat (LTR) has an inducible promoter [105] followed by the genes for: group-specific antigen (gag) encoding a polyprotein containing the major structural components of the matrix, capsid and nucleocapsid complexes; polyprotein (pol) encoding another polyprotein containing the viral enzymes protease, reverse transcriptase and integrase; viral infectivity factor (vif); viral protein R (vpr); viral protein U (vpu); envelope (env) encoding the surface and transmembrane glycoproteins; and the regulatory proteins trans-activator of transcription (tat) and regular expression of virus (rev). Both of the regulatory proteins are encoded by two exons.

28

The general features of the HIV life-cycle are depicted in Figure 1.8 for infection of a CD4+ T-lymphocyte. Upon gp120 recognition and binding to the cell surface receptor (in this case CD4), the virion attaches itself to the cell. Additional interactions between host cell surface chemokine receptors (CXCR4 or CCR5) induce a conformational change in the CD4 receptor that allows for fusion of the viral envelope with the cell plasma membrane [96,106]. The fusion and release of the viral core are not well understood processes. Once the diploid viral RNA is released into the cell, it is processed by the viral enzyme reverse transcriptase to double-stranded DNA. A pre-integration complex results from the association of the viral DNA with a complex that contains at least integrase, matrix protein, and reverse transcriptase [96]. The pre-integration complex crosses the nuclear membrane and is integrated into the host DNA by integrase and becomes the provirus.

29

chemokine receptor

recognition and binding

mature virion

fusion/penetration

CD4 RT viral RNA

reverse transcription viral DNA proviral pre-integration complex

nucleus integration

provirus

expression

Tat spliced RNA Rev

host DNA

Rev

unspliced RNA

Gag-Pol endoplasmic reticulum

ribosomal translation Gag immature virion

assembly

mature virion

budding

Figure 1.8: The general features of the HIV life-cycle upon infection of a CD4+ Tlymphocyte.

30

Expression of the virus results in a number of mRNAs of varying length. These viral RNAs fall into three categories: unspliced, partially spliced and multiply spliced [106]. The full length, unspliced RNAs can exit the nucleus for translation, or assemble at the cell membrane for packaging into a new virion. Initially, multiply-spliced mRNAs are transported to the cytoplasm and translated into the viral regulatory proteins Tat and Rev along with the accessory protein Nef. The Rev protein ensures that full length viral transcripts leave the nucleus and enter the cytoplasm for Gag and Gag-Pol synthesis and assembly of new virions [96]. The viral regulatory protein Tat is essential for virus expression and its function is described in more detail in the next section.

1.7

The HIV-1 Trans-Activator of Transcription

The HIV-1 trans-activator of transcription (Tat) is a small regulatory protein essential to the viral life cycle. Tat is a 101-residue protein that is encoded by two exons and is expressed during the early stages of viral infection [107]. In addition to its role as a transcriptional regulator of HIV gene expression, Tat has been implicated in a number of extracellular activities including supporting endothelial cell proliferation (contributing to development of Karposi Sarcoma) [108–110], inducing apoptosis of T cells [111], inducing cell death of neurons [112, 113], decreasing expression of tight junction proteins [114], disruption of the blood-brain barrier (BBB) [115], and inducing oxidative stress [113, 116]. Tat may also be involved in derepression of heterochromatin, in transcription initiation [117], and in reverse transcription [118]. Absence of Tat and low levels of CDK9 and cyclin T1 in resting CD4+ T-cells are all implicated in HIV-1 latency [119]. In general, the pathological activities of Tat contribute to both immune and non-immune dysfunction resulting in an overall increase in the impact of the viral infection. A major event in the progression of HIV is neuronal damage despite the fact that 31

neurons cannot be infected with the virus [120]. Tat can be released from infected cells within the central nervous system (CNS) (microglia and astrocytes) [121] and is able to cross the BBB [115] resulting in apoptosis of neurons. Two of the resulting central nervous system pathologies are HIV-associated dementia (HAD) and HIV-associated encephalitis (HIVE) [121,122]. Several HIV proteins have been implicated in neural dysfunction including (but not limited to) gp120, gp41, Rev, Nef and Tat [123]. The Tat amino acid sequence has a low overall hydrophobicity and a high net positive charge, and analyses by several disorder prediction algorithms suggest that it is intrinsically disordered with a possible folding nucleus between residues 42 and 75; early CD spectropolarimetry experiments suggested a lack of secondary structure [124]. The first tat exon defines amino acids 1–72 (shown in Figure 1.9) that encompass an acidic and proline-rich N-terminus (1-21), a cysteine-rich region (22–37), a core (38–47), a basic region (48–57), and a Gln-rich segment (58-72) [125]; it activates transcription with the same proficiency as the full-length protein [126–129]. Residues 1-24 form the co-activator and acetyltransferase CBP (CREB-response element binding protein) KIX domain binding site [124]. Cyclin T1 is thought to interact with the Cys-rich region of Tat [130]; mutation of any one of 6 of the 7 Cys residues results in loss of transactivation [126]. The end of the Cysrich region and the core are involved in mitochondrial apoptosis of bystander non-infected cells through their ability to bind tubulin and prevent its depolymerization [131]. The basic region is important for TAR RNA binding (see below) [132] and nuclear localization; the segment between Tyr-47 and Arg-57 has been used to transport a large variety of materials including proteins, DNA, drugs, imaging agents, liposomes, and nanoparticles across cell and nuclear membranes [133]. The Gln-rich region has been implicated in mitochondrial apoptosis of T-cells [134].

32

Pro-rich

Cys-rich

core

basic

Gln-rich

MEPVDPRLEPWKHPGSQPKTA CTNCYCKKCCFHCQVC FITKALGISYG RKKRRQRRRPP QGSQTHQVSLSKQ

10

20

30

40

50

60

70

exon 2 segment

101

Figure 1.9: The HIV-1 Tat sequence (BH10 isolate) encoded by exon 1. The 72 residue segment encompasses an N-terminal proline-rich region (1-21) containing the only three acidic residues, a cysteine-rich region (22–37), a core (38–47), a basic region (48–57), and a Gln-rich segment (58-72). Residues 73–101 are encoded by exon 2. The 72 residue segment encoded by exon 1 activates transcription with the same proficiency as the full-length protein.

The second tat exon defines residues 73–101 and includes an RGD motif that may mediate Tat binding to cell surface integrins [135].

The function of the second exon-

encoded polypeptide has, thus far, been difficult to determine [118,129]. Studies have shown that the peptide encoded by the second exon is involved in repressing expression of major histocompatibility complex (MHC) class I molecules whose presence at the cell surface serve as targets for cytotoxic T lymphocytes [136–138]. This repressive function from the exon-2 encoded peptide may contribute to HIV infected cells escaping an immune response [136,137]. There are several laboratory strains of the Tat protein with 86 residues that may originate from the HXB2 strain (subtype B) commonly found in Europe and North America [139]. These 86 residue variants are not found in natural viral isolates [140]. It has been suggested that the 86 residue form of Tat was a consequence of tissue culture passaging and a single nucleotide correction in the laboratory genomes yielded the expected 101 residue protein from the Tat coding frame [141]. During transcription of the HIV viral DNA, RNA Polymerase II (RNAP II) is halted as a result of binding to negative transcription elongation factor (N-TEF), leading to 33

prematurely terminated transcripts that may include the tat message [126]. It has also been suggested that the early HIV proteins, Nef, Tat and Rev, may result from transcription of non-integrated viral DNA [142]. Regardless of Tat’s origin, following translation it is transported from the cytoplasm into the nucleus where it binds to a stable, nuclease-resistant, stem-loop structure referred to as the trans-activation response (TAR) element. The TAR element is located downstream of the long terminal repeat (LTR) and spans nucleotides +1 to +59 of the nascent RNA [143]. Tat stimulates elongation of full-length transcripts by recruiting the positive transcription elongation factor b (P-TEFb), a hetero-dimeric complex of a regulatory cyclin T and cyclin-dependent kinase 9 (CDK9). Upon formation of the PTEFb/Tat-TAR complex, CDK9 is brought into close proximity to the carboxy-terminal domain (CTD) of RNAP II. CDK9 can then hyperphosphorylate the CTD of RNAP II, the components of N-TEF, and the transcription elongation factor Spt5 [144–146]. Recent results suggest that Tat activates P-TEFb by displacing Hexim1 (hexamethylene bisacetamideinducible protein 1) from its cyclin T1 binding site [147] and that the affinity of the Tatcyclin T1-CDK9 complex for TAR is regulated through Tat acetylation by histone acetyl transferase (HAT) [148, 149]. Tat binds directly to TAR, as depicted in Figure 1.10, through electrostatic interactions between its basic arginine-rich region and the negatively charged phosphates at a stem-loop UCU-bulge (uridine23-cytidine24-uridine25) of the RNA and the complex has a dissociation constant of Kd =12 nM [150]. The two base pairs immediately above (G26 :C39 and A27 :U38 ) the TAR bulge are also believed to be critical for Tat recognition and the two base pairs below (A22 :U40 and G21 :C41 ) the bulge also contribute to the binding affinity [151]. Phosphates at positions 22, 23 and 40 on the RNA are as well critical for Tat binding interactions [152]. The basic Arg-rich and Gln-rich regions of Tat govern the binding affinity of Tat to TAR RNA, but it is the core region that seems to control the specificity of Tat for the TAR element [153, 154]. 34

Regulation of CDK9 by cyclin T1 and TAR In addition to carrying the kinase subunit CDK9, TAK also contains a cyclin subunit called cyclin T1 (Wei et al., 1998). Cyclin T1 is required for CDK9 kinase activity and promotes auto-phosphorylation of the C-terminus of CDK9 (Fong & Zhou, 2000; Garber et al., 2000; Garber et al., 1998a). Remarkably, in addition to regulating CDK9 activity, cyclin T1 is able to mediate Tat association with TAR RNA (Figure 2). TAK U G G CDK-9 G C G G A Cyclin T1 A C U G U C C U A U Tat G C C

A C C A G A U U G G U C U C U C U G G G 5'

U G G C U A A C TAR RNA U A G G G A A C C C 3'

Figure 2. Recognition of TAR RNA by Tat and TAK. Tat recognition primarily requires interactions with the Figure 1.10: The In Tat-TAK-TAR complex. The regulatory complex formed by bulge region of TAR. the presence of regulatory cyclin T1, conformational rearrangements in Tat permit interactions with the apical loop sequences. Part of the interface between Tat and cyclin T1 is believed to involve cysteine recognition of the TAR stem-loop bulge by Tat and the Tat-associated kinases (TAK). Tat residues from each protein that participate in zinc binding (From Karn, 1999). recognition primarily involves interactions between the Arg-rich region of its basic domain

and the phosphates of the UCU bulge in the TAR element of the RNA. The Tat-cyclin T1 interaction may involve cysteine residues in both proteins through coordination with zinc ions. Reproduced from J. Mol. Biol., 293(2) pp.235–254, J. Karn, “Tackling Tat”, Copyright (1999), with permission from Elsevier and J. Karn.

35

There have been several attempts to determine solution conformations of Tat and its segments, both alone and in complexes. Most of these studies suffered from poor resolution in homonuclear 1 H NMR experiments on unlabelled protein. However, 1 H NMR spectroscopy and molecular dynamics simulations suggested that Tat1−86 (Z-variant) forms condensed domains encompassing the core and Gln-rich regions, whereas the basic and Cys-rich regions were found to be highly flexible at pH 6.3 under reducing conditions [155]. In a model of the 87-residue Tat Mal protein at pH 4.5 under oxidizing conditions, the N-terminal Trp-11 forms a hydrophobic core through interactions with Phe-38 and Tyr-47 [156]. The basic region is in an extended conformation and the Cys-rich region contains β-turns; an α-helix is found in the Gln-rich segment. A low-resolution, globular conformation with some flexible segments (particularly in the basic region) was deduced for

13

Cα -Gly-labelled synthetic Tat1−86 (Bru)

at pH 4.5, in the absence of reducing agents [157]. An oxidized Tendamistat-Tat1−37 fusion protein showed multiple conformations with some evidence of helicity in the Cys-rich region (20-33) at pH 3.5 [158]. A fusion protein consisting of the activation domain from the unrelated Equine Infectious Anemia Virus and Tat48−57 showed high helical content in the basic domain by NMR spectroscopy and CD [159]. There have also been several studies of Tat fragments in complex with TAR RNA mainly focusing on the conformation of TAR [152,160– 163]. NMR spectroscopy suggested a conformational change in Tat32−72 , in the region of Gly42 and Gly-44, upon binding to TAR [162]. 1 H NMR also showed that Tat46−55 , acetylated at Lys-50, is bound in an extended conformation to the bromodomain of p300/CBP-associated factor (PCAF), a HAT transcriptional coactivator [149]. CD spectra suggested the possibility of a conformational change in Tat1−86 upon binding to the KIX domain of CBP [124]. 15

N NMR relaxation measurements showed that Tat47−58 becomes slightly more ordered on

binding heparin [164], while CD studies of overlapping peptide fragments suggested that the most flexible regions of Tat are those that are adjacent to the basic region [165].

36

1.8

NMR Investigation of the Structure and Dynamics of Tat

In order to more clearly define the role of Tat in regulating transcription, as well as its extracellular activities, the determination of its molecular structure is critically important. However, with its low-amino acid-sequence-complexity, low overall hydrophobicity, and high net positive charge, Tat has all of the indicators of intrinsic disorder. Previous homonuclear NMR studies of Tat have shown that amide proton chemical shifts of the protein are within the range characteristic of disordered proteins [156–158]. In order to gain a greater understanding of Tat and its multifaceted activities, one should observe the behaviour of the protein in solution with and without its many binding partners. High-resolution multidimensional heteronuclear NMR will afford the greatest amount of information on the structural and dynamic properties of Tat in solution. However, it has thus far been difficult to obtain a monodisperse solution, in particular a monomeric solution, of the protein at concentrations amenable to NMR due to difficulties imposed by the readily oxidized Cys-rich region of the protein leading to mixtures of soluble aggregates. The high net positive charge of the protein also poses difficulties in that it causes the protein to stick to many charged surfaces including glass and polyanionic species in cell lysates (DNA and RNA). In order to study Tat both alone and in the presence of binding partners, isotopic labelling of the protein is necessary to resolve the complicated and crowded spectra. Isotopic enrichment of one protein in a complex will allow filtering of NMR signals and will reduce the complexity of the spectra and their analysis. To this end, one of the goals of this project was to develop a protocol for biological expression and purification of isotopically enriched Tat (in

13

C and

15

N) at yields amenable for study by NMR. The resulting protein samples

37

were also required to be monodisperse and preferably monomeric. With isotopic-enrichment, 1

H NMR resonances of a disordered protein can be more rigorously assigned. And, finally,

with the resonance assignments in hand, another goal of this research was to characterize the structure and dynamics of the protein by multinuclear NMR spectroscopy. This thesis presents a protocol for the bacterial expression of isotopically enriched recombinant Tat (residues 1–72) for structural and dynamics studies by NMR. This protocol has been used to prepare NMR-quality samples of Tat1−72 in an unambiguously reduced and monomeric state for the assignment of the protein backbone resonances [166]. These preparations have permitted the amide backbone

15

N-relaxation rates and steady-state

heteronuclear 1 H-15 N NOEs to be measured for most residues in the protein and used to gain insight into the molecular motions of this intrinsically disordered protein.

38

Chapter 2 Spectral Densities, Relaxation and Dynamics in Nuclear Magnetic Resonance Spectroscopy Preface The following treatment of the relaxation of spins in nuclear magnetic resonance (NMR) spectroscopy is based primarily on the description in Abragam (1961) [167]. However, several aspects of the Abragam description are modified to be consistent with the work of others published subsequent to Abragam’s pivotal text. Some of the notation has also been changed to avoid confusion with other standard notation schemes (e.g. Abragam’s interaction frame is denoted by “



” which also denotes the complex conjugate in many mathematical texts).

Some additional texts are noted during the course of this treatment as they provide more appropriate descriptions of some aspects of the discussion. This treatment is intended to provide the reader with a detailed description of the development of the relaxation equations

39

used in most heteronuclear NMR studies to provide dynamics information and ultimately the data for Model-Free estimation of dynamics parameters.

2.1

Semi-Classical Description of Relaxation

The theory of relaxation has four descriptions of varying complexity [168]: (i) the phenomenological Bloch equations where relaxation is described in terms of a firstorder rate process to return the magnetization to equilibrium; (ii) second-order perturbation theory, where longitudinal relaxation rates account for the transition probabilities between distinct eigenstates caused by coupling of the nuclei to the lattice; (iii) semi-classical relaxation where the lattice, with a large number of degrees of freedom, is considered to be a continuous distribution of lattice states [167]; (iv) a full quantum mechanical treatment of the lattice (the most fundamental description), which becomes necessary at very low temperatures where only a fraction of the number of degrees of freedom of the lattice are excited [167]. In the semi-classical approach to describing nuclear relaxation, the spin system is treated quantum mechanically and the surroundings (lattice) are treated classically. A drawback of this treatment is that the spin system evolves toward a final state in which the energy levels of the spin system are equally populated. Equivalently, the semi-classical theory is formally correct only for an infinite Boltzmann spin temperature; at finite temperatures a correction is required to the theory to ensure that the spin system relaxes toward an equilibrium in which populations are described by a Boltzmann distribution. The completely quantum mechanical description of spin relaxation does not suffer from the problems 40

associated with predicting the system reaching proper equilibrium, but is consequently far more complicated in its computation and therefore beyond the scope of this treatment.

2.1.1

The Master Equation of Relaxation

In the semi-classical theory of spin relaxation, the Hamiltonian for the system is written as the sum of a deterministic quantum mechanical Hamiltonian, Hdet (t), that acts only on the spin system and a stochastic Hamiltonian, H1 (t), that couples the spin system to the lattice. H(t) = Hdet (t) + H1 (t)

(2.1)

= H0 + Hrf (t) + H1 (t) where the H0 represents the Zeeman and scalar coupling Hamiltonians and Hrf (t) is the Hamiltonian for any applied radio frequency fields. The equation describing the evolution of the density operator is given by d σ = −ı[H(t), σ(t)] dt

(2.2)

The Hamiltonians Hrf (t) and H1 (t) are time-dependent perturbations acting on the main time-independent Hamiltonian H0 . The explicit influence of H0 can be removed by transforming (2.2) into the interaction representation where every operator Q is replaced by ˜ = eıH0 t Qe−ıH0 t Q

(2.3)

The interaction representation is a unitary transformation of each operator by U (t) = eıH0 t and U † (t) = e−ıH0 t (adjoint of U (t)), and d U (t) = ıH0 eıH0 t = ıH0 U (t) dt 41

(2.4)

˜ 1 = U H1 (t)U † . Then, σ ˜ = U σ(t)U † and H Consider a system in the absence of a rf -field, where the Hamiltonian is of the form

H(t) = H0 + H1 (t) and d σ = −ı[H0 + H1 (t), σ(t)] dt

(2.5)

In the interaction frame the evolution of the density operator is described by: d d σ ˜ = (U σ(t)U † ) dt dt

(2.6)

Substitution of equation (2.5) into the expansion of (2.6) yields d d dU dσ dU † σ ˜ = (U σ(t)U † ) = σU † + U U † + U σ dt dt dt dt dt = ıU H0 σ U † − ıU [H0 + H1 (t), σ]U † − ıU σH0 U † = −ıU {−H0 σ + H0 σ + H1 σ − σH0 − σH1 + σH0 }U † = −ıU {H1 σ − σH1 }U †

(2.7)

= −ıU [H1 , σ]U † = −ı[U H1 U † , U σU † ] = −ı[H˜1 , σ ˜] The equation in (2.7) can be solved by successive approximation up to the second order as follows: d σ ˜ (t$ ) = −ı[H˜1 (t$ ), σ ˜ (t$ )] dt$ # t # t $ ˜ $ ), σ d˜ σ (t ) = −ı [H(t ˜ (t$ )]dt$ 0

0

42

(2.8)

σ ˜ (t) = σ ˜ (0) − ı or equivalently, $

σ ˜ (t ) = σ ˜ (0) − ı

#

t

0

#

t!

0

˜ $ ), σ [H(t ˜ (t$ )]dt$

(2.9)

˜ $$ ), σ [H(t ˜ (t$$ )]dt$$

(2.10)

Substitution of (2.10) into (2.9) yields

σ ˜ (t) = σ ˜ (0) − ı =σ ˜ (0) − ı

# t$ 0

#

t

0

%

˜ $ ), σ H(t ˜ (0) − ı

#

0

˜ ), σ [H(t ˜ (t )]dt + ı $

$

$

t!

2

˜ $$ ), σ [H(t ˜ (t$$ )]dt$$

# t %# 0

t!

0

&'

dt$

˜ 1 (t ), [H ˜ 1 (t ), σ [H ˜ (t )]]dt $

$$

$$

$$

&

(2.11) $

dt

Again (2.11) can be rewritten as

σ ˜ (t$$ ) = σ ˜ (0) − ı

#

0

t!

˜ 1 (t$$ ), σ [H ˜ (0)]dt$ −

#

t!

0

%#

t!!

0

˜ 1 (t$$ ), [H ˜ 1 (t$$$ ), σ [H ˜ (t$$$ )]]dt$$$

&

dt$$

(2.12)

Repeating the above procedure and substituting (2.12) back into (2.11), leads to

σ ˜ (t) = σ ˜ (0)−ı

#

0

t

& # t %# t! ˜ 1 (t$ ), [H ˜ 1 (t$$ ), σ ˜ 1 (t$ ), σ [H ˜ (0)]]dt$$ dt$ +higher order terms [H ˜ (0)]dt$ − 0

0

(2.13)

If the higher order terms are dropped and σ ˜ (t) is truncated to a second order approximation, a differential equation can be obtained by differentiating (2.13) with respect to t. d˜ σ (t) ˜ 1 (t), σ = −ı[H ˜ (0)] − dt

#

t

0

˜ 1 (t), [H(t ˜ $$ ), σ [H ˜ (0)]]dt$$

(2.14)

Applying the change of variable τ = t − t$$ to (2.14) leads to d˜ σ (t) ˜ 1 (t), σ = −ı[H ˜ (0)] − dt

#

0

t

˜ 1 (t), [H(t ˜ − τ ), σ [H ˜ (0)]]dτ

(2.15)

Remark 1. The Hamiltonian H1 (t) is a random function with vanishing average value 43

˜ 1 (t) = 0). (H1 (t) = H Remark 2. Since H1 (t) is a random operator, then so is σ ˜ (t) of (2.13). The observable behaviour of a statistical ensemble will be described by an average density operator σ ˜ which obeys an equation generated by taking the ensemble average on both sides of (2.15) over all the random Hamiltonians H1 (t). To obtain the corresponding equation for the evolution of the density operator in a macroscopic sample, both sides of (2.15) must be averaged over the ensemble of subsystems. The ensemble average is performed under the following assumptions [167, 169]: ˜ 1 (t) = 0. Any non-vanishing components of H ˜ 1 (t) after (i) The ensemble average of H averaging over the ensemble can be included with H0 . ˜ 1 (t) and σ (ii) The ensemble average of H ˜ (0) can be calculated independent of each other. ˜ 1 (t) is much shorter than Remark 3. In liquids, the characteristic correlation time, τc , for H t—on the order of the rotational diffusion correlation time for the molecule (10 −12 − 10 −18 s) [169]. (iii) Given the assumption in (ii) it is permissible to replace σ ˜ (0) with σ ˜ (t) in (2.15). (iv) The upper limit of integration in (2.15) can be extended to +∞. (v) The higher order terms that would have been in (2.15) had the expression in (2.13) not been truncated to a second order approximation can be neglected.

d˜ σ (t) =− dt

#

0



˜ 1 (t − τ ), σ [H1˜(t), [H ˜ (t)]]dτ

since ˜ 1 (t), σ ˜ 1 (t), σ −ı[H ˜ (0)] = −ı[H ˜ (0)] = −ı[0, σ ˜ (0)] = 0 44

(2.16)

and where d˜ σ (t)/dt is an ensemble average (overbar omitted). Remark 4. σ ˜ will henceforth stand for the average density matrix. The semi-classical treatment of the coupling of the spin system to the lattice as a random perturbation should be corrected by replacing σ ˜ (t) with σ ˜ (t) − σ ˜0 , where σ ˜0 = σ0 =

e−!H0 /kT tr{e−!H0 /kT }

(2.17)

is the equilibrium density operator, tr denotes the trace, T is the absolute temperature, and k is the Boltzmann constant. Replacing σ ˜ (t) with σ ˜ (t) − σ ˜0 ensures that the spin system relaxes toward thermal equilibrium populations rather than to a distribution where the states are equally populated. The resulting differential equation is then d˜ σ (t) =− dt

#

0



˜ 1 (t), [H ˜ 1 (t − τ ), σ [H ˜ (t) − σ ˜0 ]]dτ

(2.18)

Remark 5. Relaxation rate constants for the density matrix elements σij are on the order of Rij = H12 (t)τc . The equation in (2.18) is valid on the time scale τc

1 1 2 (0) F Iz − F (1) I+ + F (−1) I− 3 2 2

&

(2.149)

Table 2.3: Tensor Operators for the CSA Interaction (m)

ω (m)

1

A =2

2

m 0

2 I 3 z

0

− 21 I+

ωI

0

2ωI

(m) ∗

(−m)

A2

= A2 = 2 I 3 z + 12 I− 0

(±m)

=

F2 3 2

(3 cos2 θ − 1)

∓3 sin θ cos θe±ıϕ 3 2

sin2 θe±ıϕ

Using the same arguments used for deriving the master equation for dipolar relaxation in (2.58), the unperturbed Hamiltonian can be given as Ho = ωI Iz and the tensor operator in the interaction representation is written as A˜(m) = eıHo t A(m) e−ıHo t = eımωI t A(m) 90

(2.150)

The master equation in (2.58) for the evolution of some physical variable represented by an operator Q, can then be written as " ! ˜ - (−m) - (m) .. - (−m) - (m) .. 1 2 ( ( (m) d'Q( = − ξCSA j (mωI ) ' Ap , Ap , Q ( − ' Ap , Ap , Q (0 (2.151) dt 2 m p where the CSA constant ξCSA is analogous to the dipolar constant α used previously except it has been factored out of the equation for simplicity.

ξCSA = ωI

σ|| − σ⊥ ∆σ = ωI 3 3

(2.152)

Making further simplifications using the relations in (2.123)-(2.125) along with the fact that there is only one value of p for each value of m and thus no longer requiring summation over p, (2.151) can be rewritten as ! " ˜ - (−m) - (m) .. - (−m) - (m) .. d'Q( 1 2 ( (m) 2 (m) = − ξCSA |F | J (mωI ) ' A , A ,Q ( − ' A , A , Q (0 dt 2 m (2.153) In a manner completely analogous to the development of the dipolar relaxation superoperator, the time evolution of the physical variable Q may be written as ˜ d'Q( = −('AQ ( − 'AQ (o ) dt where

.. 1 2 ( (m) 2 (m) 'AQ ( = ξCSA |F | J (mωI )' A(−m) , A(m) , Q ( 2 m

(2.154)

(2.155)

In the case of longitudinal relaxation, where Q = Iz , the double commutators evaluate

91

as follows: - (−0) - (0) .. - (0) - (−0) .. A , A , Iz = A , A , Iz $> $> '' 2 2 = Iz , Iz , Iz 3 3

(2.156)

2 = [Iz , [Iz , Iz ]] 3 =0 ) ** ) - (−1) - (1) .. 1 1 A , A , Iz = I− , − I+ , Iz 2 2 1 = [I− , [−I+ , Iz ]] 4 1 = [I− , I+ ] 4 1 = (−2Iz ) 4 1 = − Iz 2

(2.157)

) ** ) - (1) - (−1) .. 1 1 A , A , Iz = − I+ , I− , Iz 2 2 1 = [−I+ , [I− , Iz ]] 4 1 = − [I+ , I− ] 4 1 = − (2Iz ) 4 1 = − Iz 2

(2.158)

1 2 1 1 'AIz (CSA = ξCSA |F (1) |2 J (1) (ωI )'− Iz − Iz ( 2 2 2 1 2 = − ξCSA |F (1) |2 J (1) (ωI )'Iz ( 2

(2.159)

Then, (2.155) becomes

92

which in turn allows the longitudinal relaxation rate to be obtained from 1 d'Iz ( = − CSA ('Iz ( − 'Iz (o ) dt T1 1 = − CSA ('Iz ( − Izo ) T1

(2.160)

where

R1CSA =

1 T1CSA

1 2 = − ξCSA |F (1) |2 J (1) (ωI ) 2

(2.161)

If the term |F (1) |2 is evaluated as done previously with (2.126), then |F (1) |2

=

F (1) F (−1)

#

π

1 sin θF (1) F (−1) dθ 2 #0 π 8 9 1 sin θ 3 sin θ cos θe−ıϕ (−3 sin θ cos θeıϕ ) dθ = 0 2 # 9 π =− sin θ sin2 θ cos2 θe−ıϕ+ıϕ dθ 2 0 # 9 π 3 sin θ cos2 θdθ =− 2 0 ) *π 9 1 1 1 3 =− cos 5θ − cos θ − 2 cos θ 2 16 5 3 0 ! " 9 1 64 =− 2 16 15 6 =− 5 =

93

(2.162)

Substitution of the result from (2.162) into (2.161) yields 1 2 R1CSA = − ξCSA |F (1) |2 J (1) (ωI ) 2 ! " 6 1 2 J (1) (ωI ) = − ξCSA − 2 5 ! " 1 ∆σ 2 ωI2 6 =− − J (1) (ωI ) 2 9 5 1 ∆σ 2 ωI2 (1) = J (ωI ) 5 3

(2.163)

= c2 J(ωI ) where the factor 1/5 has been absorbed into the orientational spectral density function defined in (2.135) and the constant c is given by 8 9 σ|| − σ⊥ ωI ∆σωI √ c= √ = 3 3

(2.164)

Using analogous reasoning, a relation for the contribution of the CSA interaction to transverse relaxation may be obtained by using Q = I+ or I− . The double commutators of (2.155) evaluate as: - (−0) - (0) .. .. A , A , I+ = A(0) , A(−0) , I+ $> $> '' 2 2 Iz , Iz , I+ = 3 3 2 [Iz , [Iz , I+ ]] 3 2 = [Iz , −I+ ] 3 2 = I+ 3

=

94

(2.165)

) ** ) .. - (−1) - (1) 1 1 A , A , I+ = I− , − I+ , I+ 2 2 1 = [I− , [−I+ , I+ ]] 4

(2.166)

=0

) ) ** - (1) - (−1) .. 1 1 A , A , Iz = − I+ , I− , I+ 2 2 1 = [−I+ , [I− , I+ ]] 4 1 = − [I+ , −2Iz ] 4 1 = [I+ , Iz ] 2 1 = − I+ 2

(2.167)

With the double commutators evaluated for each value of m, (2.155) may be rewritten as 'AI+ (CSA

1 2 = ξCSA 2

!

" 1 (−1) 2 (1) 2 (0) 2 (0) |F | J (0) − |F | J (ωI ) 'I+ ( 3 2

(2.168)

Since the value of |F (−1) |2 = |F (1) |2 has already been determined in (2.162), it is only

95

necessary to evaluate |F (0) |2 using the relation in (2.126). |F (0) |2

#

π

1 sin θ|F (0) |2 dθ 0 2 %> & %> & # π 9 9 1 38 38 2 2 sin θ 3 cos θ − 1 3 cos θ − 1 dθ = 2 2 0 2 # 92 8 13 π sin θ 3 cos2 θ − 1 dθ = 22 0 # 3 π = (sin θ − 6 sin θ cos2 θ + 9 sin θ cos4 θ)dθ 4 0 ) ! "*π 8 9 3 1 3 5 = (− cos θ) − 2 − cos θ + 9 − cos θ 4 5 0 ! " 3 8 = 4 5 6 = 5 =

(2.169)

With the evaluations of |F (0) |2 and |F (1) |2 from (2.169) and (2.162) respectively, equation (2.168) becomes 'AI+ (CSA

! ! " ! " " 2 6 1 2 1 6 (0) (1) J (0) − − J (ωI ) 'I+ ( = ξCSA 2 3 5 2 5 ! " 1 ∆σ 2 ωI2 4 (0) 3 (1) = J (0) + J (ωI ) 'I+ ( 2 9 5 5

(2.170)

Since the equilibrium value 'I+ (o vanishes, the CSA contribution to transverse relaxation may be written as d'I+ ( 1 = − CSA ('I+ ( − 'I+ (o ) dt T2 1 = − CSA 'I+ ( T2

96

(2.171)

where R2CSA

=

1 T2CSA

! " 1 ∆σ 2 ωI2 4 (0) 3 (1) = J (0) + J (ωI ) 2 9 5 5 2 2 1 ∆σ ωI = (4J(0) + 3J(ωI )) 2 9 c2 = (4J(0) + 3J(ωI )) 6

(2.172)

Finally, the CSA contribution may be used in (2.141) for ρ∗I , as the “miscellaneous” contribution term for the total relaxation rate to obtain the total longitudinal and transverse relaxation rates.

Therefore, from equations (2.136) and (2.163) the total longitudinal

relaxation rate is II II R1II = R1(DIP OLAR) + R1(CSA)

d2 [J(ωI − ωS ) + 3J(ωI ) + 6J(ωI + ωS )] + c2 J(ωI ) = 4

(2.173)

and from (2.139) and (2.172) the total transverse relaxation rate is II II R2II = R2(DIP OLAR) + R2(CSA)

d2 c2 = [4J(0) + J(ωI − ωS ) + 6J(ωS ) + 3J(ωI ) + 6J(ωI + ωS )] + [4J(0) + 3J(ωI )] 8 6 (2.174) with constants d and c defined by (2.132) and (2.164) respectively. The corresponding equation for the total longitudinal relaxation rates of the S spin is similarly obtained from equations (2.138) and (2.163) as SS SS R1SS = R1(DIP OLAR) + R1(CSA)

d2 [J(ωI − ωS ) + 3J(ωS ) + 6J(ωI + ωS )] + c2 J(ωS ) = 4

97

(2.175)

and from equations (2.140) and (2.172) the total transverse relaxation rate of the S spin is SS SS R2SS = R2(DIP OLAR) + R2(CSA)

=

d2 c2 [4J(0) + J(ωI − ωS ) + 6J(ωI ) + 3J(ωS ) + 6J(ωI + ωS )] + [4J(0) + 3J(ωS )] 8 6 (2.176)

The cross relaxation rate is not affected by CSA interactions and is given by equation (2.137).

2.4

The Steady-State Heteronuclear Nuclear Overhauser Effect

Without delving too deeply into the origin of the nuclear Overhauser effect (NOE), or the derivation of the relaxation rates in terms of the transition probabilities and the Solomon equations, an expression may be derived for the steady-state NOE for the two spin-1/2 system considered thus far. Since the expressions for the auto- and cross-relaxation rates have already been derived in terms of spectral density functions, all of the pertinent relations necessary to obtain an expression for the steady-state heteronuclear NOE are already present. For a complete description of the origin and derivation of the NOE, the reader is referred to the extensive explanation in [171] from which this discussion is based. In an effort to avoid the complete derivation of the NOE enhancement, some preliminary points must be made without justification [171]. (i) The intensity of a resonance in an NMR spectrum is directly proportional to the population differences between the energy levels involved in the transition. (ii) The rate at which these populations return to their equilibrium populations following a perturbation is determined by a transition state probability W . Although in the present treatment, these rates have been described by the frequency of the transition and the corresponding spectral density function describing the 98

motion of the transition. In fact, one could arrive at the same results by describing the evolution of the spin operators in terms of transition probabilities and populations. (iii) For dipolar relaxation, the transition probability is dependent on (among other factors) the strength of the local field fluctuating at the frequency of the transition. This local field is the field at the site of one dipole due to the presence of the other dipole. (iv ) The frequency corresponding to the transition is proportional to the energy difference between the two states (Bohr frequency condition). ""

W1S W1I

"! W0IS

!" W1I W1S

W2IS

!!

Figure 2.2: Energy level diagram showing transition probabilities (W) for spin eigenstates α and β. The W1I and W1S probabilities are associated with single-quantum transitions of spins I and S. The probabilities W0IS and W2IS are for zero-quantum transitions (‘flip-flops’) and double-quantum transitions (‘flip-flips’), respectively. Only single-quantum transitions are considered ‘allowed’ transitions. The zero- and double-quantum transitions occur via cross-relaxation. It is assumed that the two spins in the system are close enough in space that their dipole-dipole interaction is appreciable. In other words, the spins are dipole-dipole coupled 99

but not necessarily scalar coupled. It is further assumed that these spins are part of a rigid molecule tumbling isotropically. From the energy level diagram in Figure 2.2 (which is the same as that of Figure 2.1 except that the transition frequencies have been replaced by transition probabilities) it is seen that there are two transitions that involve simultaneous flips of both spins. These are the zero-quantum (αβ ↔ βα) and double-quantum (αα ↔ ββ) transitions, with transition probabilities W0IS and W2IS respectively. These transitions are central to the NOE enhancement by allowing the saturation of spin S to affect the intensity of spin I. The zero- and double-quantum transitions are both referred to as cross-relaxation pathways. Remark 23. The zero- and double-quantum transitions are forbidden in the conventional sense and thus cannot be directly excited by an rf-pulse resulting in an NMR signal. However, the transitions are not forbidden in terms of relaxation mechanisms. There are different selection rules that govern the interactions of the spins with the lattice than those which apply to the interaction with the external oscillating field. Definition ! The NOE enhancement, fI {S} is defined as the fractional change in the intensity of I on saturating S and is given by

fI {S} =

I − Io Io

(2.177)

where I o is the equilibrium intensity of I. " The intensity of I is proportional to the sum of the population differences of the energy levels involved in the transition.

I ∝ (Nαα − Nβα ) + (Nαβ − Nββ )

100

(2.178)

The intensity of S can be obtained similarly from

S ∝ (Nαα − Nαβ ) + (Nβα − Nββ )

(2.179)

At thermal equilibrium, the intensities I o and S o are related by the following relation Io γI = So γS

(2.180)

When spin S is saturated, the populations of the αα and αβ levels are equalized; the populations of the ββ and βα levels are similarly equalized. By equalizing these sets of populations, the ββ and αβ populations are increased and consequently the αα and βα populations are decreased. The single-quantum transitions W1I and W1S only produce independent spin-lattice relaxation of spins I and S respectively. However, if the doublequantum transition (W2IS ) occurs it will act to restore the αα and ββ populations to their equilibrium values decreasing the ββ and increasing the αα populations. The net result is an increase in the population differences (Nαα − Nβα ) and (Nαβ − Nββ ) and increasing the intensity of the I resonance (i.e., a positive NOE enhancement of the I signal). By analogous arguments, it can be shown that the zero-quantum transition (W0IS decreases the intensity of I upon saturation of S and thus gives rise to negative NOE enhancements . The intensities of I and S are proportional to Iz and Sz respectively, immediately prior to the observe rf -pulse. Consequently, the vectors Iz and Sz will also be proportional to the population differences between the states. If one were to go through the derivation of the Solomon equations, one would arrive at the time evolution of the Iz and Sz vectors in terms

101

of transition probabilities to obtain the relation [171] d'Iz ( = − (2W1I + W2IS + W0IS ) ('Iz ( − Izo ) − (W2IS − W0IS ) ('Sz ( − Szo ) dt

(2.181)

Notice the similarity to the expression for longitudinal relaxation of spin I in (2.93). Here the relaxation is described in terms of the transition probabilities rather than the spectral densities of motions but the end result is the same d'Iz ( = −R1II ('Iz ( − Izo ) − R1IS ('Sz ( − Szo ) dt

(2.182)

If S is saturated with a weak rf -pulse (so as to avoid perturbing I) for a period of time t such that t >> 1/R1II and 1/R1SS , then the population of S transitions becomes equalized and the I spin evolves to a steady-state value 'Iz (ss . Under these conditions, d'Iz (ss =0 dt

(2.183)

'Sz ( = 0 d'Iz (ss = −R1II ('Iz (ss − Izo ) − R1IS (0 − Szo ) = 0 dt −R1II ('Iz (ss − Izo ) = R1IS (−Szo ) R1II ('Iz (ss − Izo ) = R1IS (Szo )

(2.184)

'Iz (ss − Izo R1IS = Szo R1II

Using the relation in (2.180) we have Szo = (γS /γI )Izo and upon substitution into (2.184) we obtain 'Iz (ss − Izo R1IS = (γS /γI )Izo R1II γS R1IS 'Iz (ss − Izo = = fI {S} Izo γI R1II 102

(2.185)

Although the nature of the auto-relaxation parameter R1II has not been specified at this point, it refers to the dipolar relaxation parameter as described by the transition probabilities (2W1I + W2IS + W0IS ). However, (2.185) could be easily modified to include a “miscellaneous” relaxation contribution such as the CSA interaction.

Thus, upon

substitution of (2.173) and (2.137) into (2.185), the steady-state NOE is obtained in the form fI {S} =

'Iz (ss − Izo γS R1IS = II II Izo γI R1(DIP OLAR) + R1(CSA) γS = γI

d2 4 d2

[−J(ωI − ωS ) + 6J(ωI + ωS )]

[J(ωI − ωS ) + 3J(ωI ) + 6J(ωI + ωS )] + c2 J(ωI ) 4 γS −J(ωI − ωS ) + 6J(ωI + ωS ) = 2 γI J(ωI − ωS ) + 3J(ωI ) + 6J(ωI + ωS ) + 4c J(ωI ) d2

2.5

(2.186)

Lipari-Szabo Model-Free Formalism

Recall from the definition of the power spectral density function in (2.45) describing the contribution to orientational dynamics of the molecular motions with frequency components in the ω to ω + dω range that j (ω) = Re (q)

/#



−∞

(−q) (q) Fk (t)Fk (t

−ıωτ

+ τ )e



0

(2.187)

For relaxation in isotropic liquids at the high temperature limit, j (q) (ω) = (−1)q j (0) (ω) ≡ (−1)q j(ω)

(2.188)

where j(ω) is the auto-spectral density function [169]. The consequence of (2.188) is that only one auto-spectral density needs to be calculated. As mentioned in section 2.2.4, the (q)

spatial functions F2

arise from tensor operators of rank k = 2 and may then be expressed

103

in terms of spherical harmonics. (0)

F2

(2.189)

= c0 (t)Y20 [Ω(t)]

where Ω(t) represents the time variation of the polar angles θ(t) and ϕ(t) in the laboratory reference frame which define the orientation of the unit vector involved in the interaction (i.e. in the direction of the internuclear bond vector connecting spins I and S for the dipolar interaction). With (2.189) the auto-spectral density j(ω) can then be expressed as j(ω) = Re = Re

/#



/#−∞ ∞ −∞

−ıωτ

c0 (t)c0 (t + τ )Y20 [Ω(t)]Y20 [Ω(t + τ )]e 0 −ıωτ C(τ )e dτ



0

(2.190)

where the stochastic correlation function C(τ ) has been introduced and is defined [169] as

C(τ ) = c0 (t)c0 (t + τ )Y20 [Ω(t)]Y20 [Ω(t + τ )]

(2.191)

For a rigid spherical molecule undergoing Brownian rotational motion, c0 (t) = c0 and the auto-spectral density function [169] can be described by the orientational spectral density function introduced in (2.135) j(ω) = d2 J(ω)

(2.192)

where d is the constant from (2.132) and is equal to c0 . The corresponding orientational correlation function is defined as

C0 (τ ) = 4πY20 [Ω(t)]Y20 [Ω(t + τ )]

(2.193)

where the spherical harmonics defined in (2.122) are being used. Again, it is assumed that the correlation function takes the form of e−|τ |/τc as was done in (2.133). However, in this 104

case the normalization factor 1/(2k + 1) (related to the rank k of the tensor or the order of the spherical harmonic) must be included2 to the spherical harmonics in equation (2.122). Then for k = 2, the orientational correlation function is 1 C0 (τ ) = e−|τ |/τc 5

(2.194)

Upon Fourier transformation the same result as in (2.135) is obtained

J(ω) =

2 τc 5 1 + ω 2 τc2

Since proteins are not rigid spheres but contain internal dynamics in addition to the overall rotational correlation of the molecule, a description of the internal dynamics is required. If the overall motion of the molecule is isotropic and the internal motions differ from the overall motions by at least two orders of magnitude, then the stochastic correlation function is separable and can be written as

C(τ ) = CO (τ )CI (τ )

(2.195)

In other words, the overall correlation function CO (τ ) and the internal correlation function CI (τ ) are said to be stochastically independent. The overall correlation is the same as the function defined in (2.194). However, the internal correlation function does not present itself so straightforwardly. The reason for this is that, with the overall correlation function, an idealized spherical top tumbling isotropically in the laboratory reference frame was assumed. In the case of internal motions, some description of each of the transition sites within 2

The additional factor 1/(2k + 1) is required due to the fact that Lipari and Szabo did not ? use the normalization factor (2k + 1)/4π in their definition of the spherical harmonics Ykq (θ, ϕ) as was done in (2.122).

105

the molecule—a model—is required. However, if the internal motions could be expressed analogously to the overall motion as something with similar exponential character, this would lead to a summation over all sites in the molecule

CI (t) =

(

ai e−t/τi

(2.196)

i

The length of this expansion and the magnitudes of the amplitudes of motion (ai ) and correlation times (τi ) depend on the nature of the motion and require a model description. However, it is possible to infer some properties of the internal correlation function that are model independent (or model-free) if it is assumed that the internal motions are on a faster time-scale than the overall motions. The first inference that may be made is that at t = 0 the normalized correlation function CI (0) = 1. The second inference is that for long times (t = ∞) the internal correlation function is CI (∞) = S2 , where S is the generalized order parameter. S2 describes the model-independent behaviour of the internal correlation function CI (t). When the internal motion of the molecule is completely unrestricted, the internuclear vector will sample all possible orientations with equal probability; the internal correlation function will go to zero and S2 = 0. If, on the other hand, the internal motion is completely restricted as in a rigid molecule, then the function of the orientational probability will vanish for all orientations not equal to the t = 0 orientation. In this case, CI (t) = CI (0) = 1 and S2 = 1. In the Lipari-Szabo model-free formalism, the internal correlation function is approximated by a single exponential with correlation time τe that decays toward S2 as t → ∞ [175, 176]. The Lipari-Szabo internal correlation function is then CI (t) = S2 + (1 − S2 )e−t/τe

106

(2.197)

With this approximation to the internal correlation function, the total correlation function can be expressed as the product of the overall and internal correlation functions as in (2.195) to obtain the auto-spectral density function j(ω) = Re

/#



−ıωτ

C(τ )e

/#−∞ ∞



0

CO (τ )CI (τ )e dτ = Re −∞ ) * 2 S2 τc2 (1 − S2 )τ 2 = + 5 1 + ω 2 τc2 1 + ω2τ 2 −ıωτ

0

(2.198)

where (2.194) has been used for the overall correlation function and 1 1 1 = + τ τc τe

(2.199)

The Lipari-Szabo formalism thus results in a description of the motion of the protein in terms of the spatial restriction of internal motion (S2 ), the overall correlation time τc and the effective internal correlation time τe . However, the formalism hinges on the assumption that the overall and internal motions are stochastically independent with the internal motions being on a faster timescale than the overall motion.

2.6

Relaxation in the Rotating Frame

Remark 24. The following development of relaxation in the rotating frame is not as complete as the previous development of the longitudinal and transverse relaxation rates (Sections 2.2 and 2.3). In this section many of the results from the previous sections are utilized to simplify the development of the rotating frame relaxation rates. Even with the aid of the results of the previous sections, the following ‘skeleton outline’ of the development of the rotating frame relaxation rates is still lengthy. 107

In the previous sections it has been shown how heteronuclear relaxation rates probe dynamics, over a wide time-scale, through the behaviour of the spectral densities. These spectral densities provide measurements of the motional behaviour of proteins in solution and can be obtained from relaxation rates of protonated

15

N (in particular the backbone

amide) which are dominated by dipole-dipole interactions between the nitrogen and its attached proton. Traditionally, protein dynamics are investigated through measurements of the longitudinal (R1 ) and transverse (R2 ) relaxation rates and the steady-state heteronuclear NOE [177]. The transverse relaxation rate is usually obtained through a Carr-PurcellMeiboom-Gill (CPMG) sequence [178] where the dependence of the R2 rate on the CPMG delay probes conformational exchange processes on the order of 103 to 104 Hz [179]. Alternatively, the transverse relaxation rate can be obtained through on-resonance spin-lock based sequences [180] where the relaxation rate in the rotating frame is measured— strictly a doubly tilted rotating frame. In this situation, the measurement of the spin-lock relaxation rate, R1ρ , as a function of the spin-lock amplitude results in equivalent probe of the spin-spin relaxation. The range of exchange rates observed by the spin-lock relaxation rate is extended if the measurements are made along an effective field tilted away from the static field axis. The spin-lock relaxation rate, R1ρ , is thus measured along the effective field and as such is referred to as longitudinal relaxation in the rotating frame. Without going into the same level of detail in describing the derivation of the relaxation rates in terms of spectral densities, it will be shown that the on-resonance R1ρ rate is equivalent to the transverse relaxation rate with some added benefits. For a more complete description of 15 N longitudinal relaxation in the rotating frame, see [180–182], from which this discussion is derived. In order to obtain an expression for the relaxation rate in the rotating frame in terms of spectral densities, the laboratory frame must first be transformed to an appropriate

108

interaction frame in which the magnetic field terms from the Hamiltonian perturbation have vanished. To begin, it is necessary to revisit the expression of the laboratory frame Hamiltonian in (2.1) H(t) = H0 + Hrf (t) + H1 (t) where H0 is the Zeeman term, Hrf (t) is the time-dependent Hamiltonian of an applied rf -field and H1 (t) is also a time-dependent perturbation to the main Hamiltonian. In the previous sections describing dipolar relaxation, it was assumed that there was an absence of an external field, but here that simplification is not possible. The rotating frame is usually defined with its Z -axis parallel to the static field B0 (i.e., coincident with the laboratory z -axis). A spin-lock field B1 is applied perpendicular to the static field B0 (see Figure 2.3). For the current discussion, consider the spin-lock to be applied selectively such that it affects only the S spins (in this case

15

N). The Hrf (t) term is then given by

Hrf (t) = ω1 (Sx cos(ω0 t) + Sy sin(ω0 t))

(2.200)

where ω1 = γs B1 and ω0 is the carrier frequency. The resultant effective field, Bef f , is then tipped away from the static field by an angle β and has an x -component of −ω1 /γs and a z -component of −(ωS − ω0 )/γS where ωS is the Larmor frequency of the S spin.

109

Bo z

"(#S"#$)/%S

Beff

! y B1=-#1/%S x

Figure 2.3: Effective magnetic field vector (Bef f ) in the rotating frame resulting from an applied spin-lock (B1 ) perpendicular to the static field. The effective field is tilted away from the static field vector (B0 ) by an angle β. The tip angle β is then given by

tan β =

ω1 ωS − ω0

(2.201)

and frequency of the effective field is then

ωe = γS Bef f

= = ω12 + (ωS − ω0 )2

(2.202)

To obtain a representation of the Hamiltonian in a reference frame in which the applied fields have vanished, three successive rotational transformations must be carried out on those spins experiencing the spin-lock. Conventional transformation to the rotating frame will only remove the dependence on the static field B0 . In order to remove the dependence on the spin-lock field B1 , a transformation to a doubly rotating frame results in an expression for 110

the Hamiltonian in which only the local perturbing operators remain. Using the same expressions for the Zeeman Hamiltonian and tensor operators as in (2.63) to (2.66), the laboratory frame and Hamiltonian is defined with the additional spinlock Hamiltonian term defined as in (2.200) with ω0 = ωS . If the rotations are considered in terms of the Euler angles α, β and γ then the first rotation is through an angle α(t) = ωS t about the laboratory z -axis. Hence {x, y, z} → {X, Y, Z} which is the conventional transformation to the rotating frame with the laboratory z -axis coincident with the rotating frame Z -axis. The second transformation is through an angle β about the new Y -axis such that {X, Y, Z} → {X ∗ , Y ∗ , Z ∗ }. This rotation effectively tips the Z -axis to be coincident with Bef f . Then a third rotation through an angle γ(t) = ωe t about the Z* -axis results in the final doubly tilted rotating frame where {X ∗ , Y ∗ , Z ∗ } → {x$ , y $ , z $ }. Thus, the rotations can be summarized by the rotational operator U S

U S = eıωe tSz eıβSy eıωS tSz

(2.203)

An additional rotational operator U I is also applied to the Hamiltonian which corresponds to the rotation through an angle δ(t) = ωI t about the laboratory z -axis such that U I = eıωI tIz

(2.204)

The rotation by the operator in (2.204) corresponds to the single rotation of the I spin (proton). The I spin is not affected by the spin-lock and therefore undergoes no additional rotations.

111

Therefore, the combined rotational operator for the I and S spins is of the form U = U S U I = eıωe tSz eıβSy eıωS tSz eıωI tIz

(2.205)

ıωe tSz ıβSy ı(ωI Iz +ωS Sz )t

e

=e

e

Hence, the Hamiltonian will transform according to (2.206)

H$ = U HU †

The development of the dipolar relaxation rate is essentially the same as that given in Section 2.2.2 except that the tensor operators—the A(q) ’s of (2.61)—need to be transformed into the doubly tilted rotating frame. To obtain the transformed tensor operators resulting from a rotational transformation R about the y-axis by an angle β on spin S, the following relations are used [168, 182–184]: 



cos β 0 − sin β    RyS (β) =  1 0    0   sin β 0 cos β †

RyS (β)Sz RyS (β) = sin βSx + cos βSz †

RyS (β)S± RyS (β) = cos βSx ± ıSy − sin βSz Sx =

S+ + S− 2

S+ − S− 2 ! " ! "2 β cos β − 1 4 = sin 2 2 ! " "2 ! β cos β + 1 4 = cos 2 2 ıSy =

112

(2.207)

(2.208) (2.209) (2.210) (2.211) (2.212) (2.213)

The spin tensor operators A(q) in (2.61) can be transformed [185] to B (q) according to †

B (q) = RyS (β)A(q) RyS (β)

(2.214)

The end result for the B (q) ’s—after a lot of simplifying—are expressions that are linear (q)

combinations of the Ap ’s.



B (0) = RyS (β)A(0) RyS (β)

7 1 ! cos β − 1 " 6 6 7 1 (−1) (1) (2) (−2) + − sin β A0 + A0 A0 + A0 = 3 3 2 "6 ! 7 7 6 cos β − 1 1 (0) (−0) (1) (−1) + − sin β A1 + A1 A1 + A1 2 6 (0) cos βA0

(2.215)



B (1) = RyS (β)A(1) RyS (β) ! ! " " cos β + 1 cos β − 1 3 (0) (1) (1) (−1) A0 + A0 + A0 + cos βA1 = 2 2 2

(2.216)

(0)

(2)

+ sin βA0 + 3 sin βA1

B

(−1)

=

!

+

(−2) sin βA0

cos β + 1 2

"

+

(−1) A0

+

!

cos β − 1 2

"

3 (0) (1) (−1) A0 + A0 + cos βA1 2

(2.217)

(0) 3 sin βA1



B (2) = RyS (β)A(2) RyS (β) ! " " ! cos β + 1 cos β − 1 1 (2) (0) (1) = A0 + 3 A1 − sin βA1 2 2 2

113

(2.218)

B

(−2)

=

!

cos β + 1 2

"

(−2) A0

+3

!

cos β − 1 2

"

(0)

A1 −

1 (−1) sin βA1 2

(2.219)

Now we re-express (2.58) in terms of the transformed spin tensor operators: ! " ˜ - (−q) - (q) .. - (−q) - (q) .. d'Q( 1 ( ( q (q) =− j (ωp ) ' Bp , Bp , Q ( − ' Bp , Bp , Q (0 dt 2 q p

(2.220)

For longitudinal relaxation in the rotating frame, the differential equation for the relaxation of spin S by dipolar interaction with spin I is obtained with Q = Sz . As such, analogous to equation (2.84) we have d'Sz (ρ = −('BzS ( − 'BzS (o ) dt

(2.221)

and analogous to equation (2.85), the relaxation superoperator is given by BzS =

1 ( ( (q) (q) - (−q) - (q) .. j (ωp ) Bp , Bp , Sz 2 q p

(2.222)

which has an expectation value given by 'BzS ( =

1 ( ( (q) (q) - (−q) - (q) .. j (ωp )' Bp , Bp , Sz ( 2 q p

(2.223)

Remark 25. Note in (2.221) that the notation for the relaxation superoperator has been changed from A to B to denote the fact that the operator is for the doubly tilted rotating frame. For this reason, the subscript ρ has also been added to d'Sz (/dt to avoid confusion with (2.99). Since the expressions for the transformed spin tensor operators are simply linear (q)

combinations of the Ap ’s used previously in evaluating the double commutators with Q = Iz , it not necessary to re-evaluate all of the double commutators. Instead, it is simply a matter of interchanging the I and S terms and including the coefficients of the linear combinations. 114

Therefore, as in Section 2.2, the relaxation superoperator BzS may be written as 2BzS

! " 8 29 4α2 α2 β 2 (0) 4 j (0) (ωI − ωS + ωe )(Sz + Iz ) = sin βj (ωe ) Iz Sz + sin 9 18 2 ! " ! " 8 9 α2 β β (0) 2 4 4 + j (ωI − ωS − ωe )(Sz − Iz ) + 4α cos j (1) (ωS + ωe ) Iz2 Sz cos 18 2 2 ! " 2 8 9 β α + 4α2 sin4 j (1) (ωs − ωe ) Iz2 Sz + sin2 βj (1) (ωI + ωe )(Sz + Iz ) 2 2 2 2 α α sin2 βj (1) (ωI − ωe )(Sz − Iz ) + cos4 βj (2) (ωI + ωS + ωe )(Sz + Iz ) + 2 2 ! " α2 β + sin4 j (2) (ωI + ωS − ωe )(Sz − Iz ) (2.224) 4 2

Recalling that the terms j (q) (ω) = |F (q) |2 J (q) (ω) as defined in (2.128) to (2.130), that I(I + 1) = S(S + 1) = 3/4, and noting that the spatial functions F (q) are invariant under the transformation to the doubly tilted rotating frame, (2.224) may be simplified to / ! " γI2 γS2 !2 6 µo 7 1 2 (0) 1 4 β = sin βJ (ωe ) + sin J (0) (ωI − ωS + ωe ) 5r6 4π 2 4 2 ! " ! " ! " β 1 4 β 3 3 4 β 4 (0) (1) + sin J (ωI − ωS − ωe ) + cos J (ωS + ωe ) + sin J (1) (ωS − ωe ) 4 2 4 2 4 2 ! " 3 2 (1) 3 3 2 (1) β 4 + sin βJ (ωI + ωe ) + sin βJ (ωI − ωe ) + cos J (2) (ωI + ωS + ωe ) 8 8 2 2 ! " 0 3 4 β (2) + sin J (ωI + ωS − ωe ) 'Sz ( 2 2 / ! " ! " 1 γI2 γS2 !2 6 µo 7 1 4 β β (0) 4 J (ωI − ωS + ωe ) − cos J (0) (ωI − ωS − ωe ) + sin 6 5r 4π 4 2 4 2 ! " 3 3 β 3 + sin2 βJ (0) (ωI + ωe ) + sin2 βJ (0) (ωI − ωe ) + cos4 J (2) (ωI + ωS + ωe ) 8 8 2 2 0 ! " 3 4 β (2) − sin J (ωI + ωS − ωe ) (2.225) 2 2 'BzS (

The relations in (2.134) and (2.135) are applied to obtain the expression 1 J(ω) = J (q) (ω) 5 115

(2.226)

Using (2.226) along with the definition of d from (2.132) and some rearranging of terms, the longitudinal relaxation rate equation for the rotating frame may be expressed, analogous to (2.93), as 8 9 d'Sz (ρ = − 'BzS ( − 'BzS (o dt 1 1 = − SS ('Sz ( − 'Sz (o ) − SI ('Iz ( − 'Iz (o ) T1ρ T1ρ

(2.227)

where the rotating frame longitudinal auto-relaxation rate is given by / ) * ! " ! " β d2 β 2 4 4 = SS = 4 sin βJ(ωe ) + 2 sin J(ωI − ωS + ωe ) + cos J(ωI − ωS − ωe ) 8 2 2 T1ρ ! " ) * ! " β β 4 4 +6 cos J(ωS + ωe ) + sin J(ωS − ωe ) + 3 sin2 β [J(ωI + ωe ) + J(ωI − ωe )] 2 2 *0 ) ! " ! " β β 4 4 J(ωI + ωS + ωe ) + sin J(ωI + ωS − ωe ) (2.228) +12 cos 2 2

SS R1ρ

1

and the cross-relaxation rate is / ) ! " ! " * β β 4 4 2 sin J(ωI − ωS + ωe ) − cos J(ωI − ωS − ωe ) 2 2 ! " ) ! " *0 β β 2 4 4 +3 sin β [J(ωI + ωe ) + J(ωI − ωe )] + 12 cos J(ωI + ωS + ωe ) + sin J(ωI + ωS − ωe ) 2 2 1 d2 = SI = 8 T1ρ

SI R1ρ

(2.229)

In the on-resonance case, where the tilt angle β = π/2, (2.228) reduces to * ) * 1 1 1 1 4J(ωe ) + 2 J(ωI − ωS + ωe ) + J(ωI − ωS − ωe ) + 6 J(ωS + ωe ) + J(ωS − ωe ) 4 4 4 4 ) *0 1 1 +3 [J(ωI + ωe ) + J(ωI − ωe )] + 12 J(ωI + ωS + ωe ) + J(ωI + ωS − ωe ) (2.230) 4 4

SS R1ρ

d2 = 8

/

)

If it is further assumed that, since ωe is in the kHz range (compared to the MHz range for

116

ωI and ωS ), then the following approximations may be made:

J(ωI ± ωS ± ωe ) ≈ J(ωI ± ωS )

(2.231a)

J(ωI ± ωe ) ≈ J(ωI )

(2.231b)

J(ωS ± ωe ) ≈ J(ωS )

(2.231c)

With these approximations, (2.230) further reduces to SS R1ρ =

d2 {4J(ωe ) + J(ωI − ωS ) + 3J(ωS ) + 6J(ωI ) + 6J(ωI + ωS )} 8

(2.232)

Excepting the fact that the rate equation in (2.232) contains a term for J(ωe ) rather than J(0), the equation is identical to that of the laboratory frame R2SS for the dipolar interaction in equation (2.140). If the additional contribution for the CSA interaction is considered in the on-resonance case, then there will be an additional term corresponding to SS R1ρ(CSA)

c2 = 6

/

0 3 4J(ωe ) + [J(ωS + ωe ) + J(ωS − ωe ) 2

(2.233)

where the constant c is the same as that defined in (2.164). Using the approximations in (2.231c), (2.233) reduces to SS = R1ρ(CSA)

c2 {4J(ωe ) + 3J(ωS )} 6

(2.234)

Hence the longitudinal relaxation rate in the rotating frame for the on-resonance condition

117

is SS SS SS R1ρ = R1ρ(DIP OLAR) + R1ρ(CSA)

d2 {4J(ωe ) + J(ωI − ωS ) + 3J(ωS ) + 6J(ωI ) + 6J(ωI + ωS )} 8 c2 + {4J(ωe ) + 3J(ωS )} 6 =

(2.235)

Therefore, for the on-resonance case with the tilt angle β at 90◦ , the longitudinal relaxation rate in the rotating frame, R1ρ , is identical to the transverse relaxation rate R2 in the laboratory frame except for the low frequency term which is now J(ωe ) rather than J(0). Consequently, measurements of R1ρ can be used as a substitute for R2 in the Lipari-Szabo Model-Free estimation of dynamics parameters. Remark 26. The term for the CSA contribution for the relaxation rate could have been developed in the same way as the dipolar contribution was, but for simplicity it is just given. This development of R1ρ is lengthy enough as it is.

118

Chapter 3 Materials and Methods 3.1

Plasmid construction

The Tat expression vector used throughout this work was constructed by Gillian Henry from the E. coli codon-optimized exon 1 tat gene (residues 1-72 of the HIV-1 BH10 isolate) contained in pSV2tat72 obtained through the AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, NIH from Dr. Alan Frankel [186]. A brief outline of the development of the expression vector is as follows. The tat gene was amplified by the polymerase chain reaction (PCR) using pSV2tat72 as template and the following forward (Nde I) and reverse (Bgl II) primers (5’-ATGATCGTCATATGGAACCGGTCGACCCGCGT3’ and 5’-CCGGGAGATCTTCACTGTTTAGACAGAGAAACCTGGTGGGTC-3’). The PCR amplified DNA was then ligated into pUC18 (Pierce, Milwaukee, WI) that had been opened with Sma I. The insert was DNA sequenced and the resulting plasmid is referred to as pUC18tat. The tat exon 1 gene from pUC18tat was removed using Nde I and Bgl II and the purified fragment ligated into pET28b(+) (Novagen, Madison, WI) that had been opened with Nde I and BamH I. The expression vector was verified using the PCR primers for

119

sequencing. The pET28tat plasmid was transformed into NovaBlue cells (Novagen, Madison, WI) for plasmid storage and into E. coli BL21(DE3)pLysS cells for protein expression with an N-terminal hexahistidine segment (His-tag) and thrombin cleavage site that adds 20 residues to the 72 residue protein.

3.2

Expression of unlabelled His-tagged Tat1−72

Initial experiments were designed to test the expression system and were done using nonlabelling conditions for the over-expression of Tat. The following expression protocol was developed to increase the protein yield and simplify the procedure to allow convenient production of significant amounts of protein for use in NMR experiments. Transformed cells from a 100 µL glycerol stock were grown up in 50 mL of Terrific Broth (TB) (Sigma, St. Louis, MO) inoculated with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin for 16 hours at 37 ◦ C in a rotary shaker. A 10 mL aliquot of the overgrown culture was then added to 1 L of pre-incubated (37 ◦ C) TB (with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin) in a 2 L baffled flask. Cell growth was monitored by optical density measurements at 600 nm until the measured reading was 0.8. Expression was then initiated by induction with 60 mg of isopropyl-β-D-thiogalactopyranoside (IPTG) (Sigma, St. Louis, MO). This level of IPTG (∼0.25 mM) was chosen following experiments that showed an increasing yield of expressed protein upon reduction of the IPTG concentration. The standard starting point for induction of lac-repressor regulated promoters is 1 mM [187]. In some experiments the IPTG concentration was reduced to as low as 0.1 mM (see Chapter 5 for further details on optimization). Cells were allowed to express for 5 hours before the cell culture was put on ice for 15 minutes to halt the protein expression. The cells were then collected by centrifugation at 2,600×g for 15 minutes, sealed in bottles under an argon atmosphere prior to freezing in liquid nitrogen, and stored at -72 ◦ C. 120

3.3

Expression of

13

C/15N-His-tagged Tat1−72

The following expression protocol was modified from published methods [188] to reduce the consumption of isotopically-labelled ingredients. As with the unlabelled protein expression, cell growth was initiated from a 100 µL glycerol stock of the pET28tat-transformed cells into 50 mL of TB (with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin) and grown for approximately 15 hours. Four 10 mL aliquots of the 50 mL overgrown cell culture were then used to inoculate 4×2 L baffled flasks each containing 1 L of pre-incubated (37 ◦ C) TB (with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin). Cells were grown at 37 ◦ C in a rotary shaker; growth was halted when the optical density of each flask reached 0.6-0.9. Flasks were submerged in crushed ice for 15 minutes to halt cell growth and then cells were collected by centrifugation at 2,600×g at 4 ◦ C for 15 minutes. Cell pellets were re-suspended in 40 mL of M9 salts solution (see Table 3.1) to wash away residual rich media, pooled, and then centrifuged again at 2,600×g for 15 minutes. The single pooled pellet was then re-suspended in 10 mL of the M9 wash solution and then added to 1 L of pre-incubated (37 ◦ C) M9 minimal medium with 34 µg/mL chloramphenicol and 30 µg/mL kanamycin. The M9 medium was adapted from [189] and contained 0.7 g 15 NH4 Cl and 2 g of 13 C6 -glucose (Cambridge Isotope Laboratories Inc., Andover, MA) and was supplemented with vitamins and micronutrients (see Table 3.1). The cells were allowed to adjust to the new medium for 15 minutes and then over-expression was induced upon addition of 240 mg of IPTG. Cell expression was stopped after 5 hours and cells were harvested by centrifugation at 2,600×g at 4 ◦ C. Cell pellets were re-suspended with 10 mL of M9 wash solution per pellet, pooled, and centrifuged at 2,600×g for 15 minutes. The supernatant was removed and the bottle sealed in an argon atmosphere prior to freezing in liquid nitrogen for storage at -72 ◦ C.

121

Table 3.1: M9 Minimal Medium ingredients adapted from [189] Component

Concentration (mM)

KH2 PO4

22

Na2 HPO4

42

NH4 Cl

12.8

15

MgSO4

2

CaCl2

0.01

NaCl

8.5

FeSO4

0.01

U-13 C6 -glucose

10.7

(NH4 )6 (MoO7 )24

3×10−6

H3 BO3

4×10−4

CoCl2

3×10−5

CuSO4

1×10−5

MnCl2

8×10−5

ZnSO4

1×10−5

Choline chloride

2.9×10−3

Folic acid

1.1×10−3

Pantothenic acid

2.1×10−3

Nicotinamide

4.1×10−3

Myo-inositol

5.5×10−3

Pyridoxal hydrochloride

2.4×10−3

Thiamin hydrochloride

1.5×10−3

Riboflavin

1.4×10−4

Biotin

4.1×10−3

122

3.4

Purification of His-tagged Tat1−72

Cell lysis was achieved by two freeze-thaw cycles, each with a 30 minute incubation period at room temperature following complete thawing of the pellet. DNase I and RNase I (Sigma, St. Louis, MO) were added to the lysate (200 µg of each) and incubated at 37 ◦ C for 30 minutes. A 100 mL aliquot of extraction buffer (see Table 3.2) was added to the lysate and the mixture was microprobe-sonicated (twice at 35 % power with 30 second bursts and 30 seconds between bursts) using a Fisher Sonic Dismembrator Model 300 (Fisher Scientific, Norcross, GA). The lysate was then centrifuged at 17,000×g for 30 minutes, and the supernatant was poured over a 4 mL bed of Talon™ (cobalt-Superflow™) metal affinity resin (Clonetech, Palo Alto, CA) in a 10 mL polypropylene gravity flow column (QIAGEN Inc., Mississauga, ON). Because of the expectation of higher yields of unlabelled protein, the extract was usually divided into two identical portions to avoid saturating the cobalt metal affinity resin. The resin was pre-equilibrated with the extraction buffer prior to introduction of the extract. The resin was washed with 20 mL of additional extraction buffer followed by 30 mL of wash buffer (see Table 3.2). Tat protein was released from the cobalt column with the elution buffer (see Table 3.2) and 10×1 mL fractions were collected. The fractions were pooled and serially dialysed against 1 L of degassed acetate buffer at pH 3 at concentrations of 0.1 M, 0.05 M, and 0.01 M (approximately 6 hours each). A final dialysis was done against degassed water for 4 hours. Each of the dialysis buffers was sealed under an argon atmosphere. A 1 mL aliquot was removed from the dialysate for near-ultraviolet (near-UV) absorbance analysis and mass spectrometric analysis; the remainder of the dialysate was frozen and freeze-dried.

123

Table 3.2: Protein purification buffers. Buffer

pH Composition

Extraction

7.2

6 M guanidine hydrogen chloride (Gdn-HCl); 100 mM sodium phosphate; 10 mM tris(hydroxymethyl) aminomethane hydrochloride (Tris-HCl); 10 mM tris(2-carboxyethyl) phosphine (TCEP)

Wash Elution

3.5

6.4 4

6 M Gdn-HCl; 50 mM sodium phosphate; 10 mM TCEP 6 M Gdn-HCl; 50 mM sodium acetate; 10 mM TCEP

MALDI-TOF-MS

To assess the purity of the protein sample and identify the Tat monomer, Vincent Chen from the Hélène Perreault lab at the University of Manitoba, prepared samples for matrixassisted laser desorption-ionization time-of-flight mass-spectrometry (MALDI-TOF-MS). A 10 µL aliquot of the dialysate (from the unlabelled Tat purification) in aqueous solution was subjected to solid phase extraction (SPE) to remove unwanted salts and buffers using a Millipore C18 ZipTip™(Billerica, MA) following the manufacturer’s recommended protocol as follows: SPE-treated samples were concentrated by aspirating the SPE tip with 2 µL of 50:50 acetonitrile/water with 0.1% trifluoro-acetic acid (TFA). Samples were then mixed with 2 µL of sinapinic acid (3,5-dimethoxy-4-hydroxycinnamic acid) matrix solution (Sigma, St. Louis, MO) saturated in water and transferred to a Bruker Scout™ (Billerica, MA) 384 stainless steel target. Mass spectrometric analysis was performed on a Bruker Biflex™ IV MALDI-TOF instrument operated in positive, linear mode with acceleration potentials of 21 kV and 17 kV for lenses 1 and 2, respectively. The instrument was externally calibrated with the [M+H]+ and [M+2H]2+ ions of bovine serum albumin (BSA) (m/z 66431, m/z 33215) and myoglobin (m/z 16952.62, m/z 8476.81).

124

3.6

NMR Sample Preparation

Freeze-dried Tat protein was dissolved in 600 µL of degassed buffer containing 50 mM acetate-d4 /ammonium hydroxide, 20 mM 2-(N-morpholino)ethanesulfonic acid (MES) (only in

13

C/15 N-labelled sample), 80 µM sodium sulfite, 0.02% sodium azide and 5% D2 O. The

resulting protein solutions were at pH 4 (unlabelled) and pH 4.1 (13 C/15 N-labelled). The samples were put into 5 mm (535-PP) NMR tubes (Wilmad-Labglass, Buena, NJ) that had been purged with argon gas for 15 minutes and the dissolved protein was added to the sample tube under an argon atmosphere. The NMR tube caps were then sealed with Teflon® tape (DuPont, Wilmington, Delaware). The final protein concentration in the NMR tube was 1.5 mM (unlabelled) and 1 mM (13 C/15 N-labelled) Tat.

3.7 1

NMR HSQC Acquisition

H/15 N heteronuclear single quantum coherence (HSQC) spectra of the 92-residue His-

tagged Tat1−72 were acquired, for both the unlabelled protein (using the natural abundance of the

15

N isotope for the indirect dimension) and the

13

C/15 N-labelled protein, on a

600 MHz Varian INOVA spectrometer (14.1 tesla field strength) equipped with a triple resonance probehead at 20.2 ◦ C, using the standard gradient sensitivity-enhanced HSQC Varian BioPack pulse sequence [190]. The NMR probe temperature was calibrated with methanol [169] and spectra were processed with NMRPipe [191]. HSQC experiments were collected with 2048 complex points in the direct dimension for both samples. For the indirect 15

N dimension, 256 and 128 complex points were collected for the

13

C/15 N-labelled and the

unlabelled samples, respectively. Sweep widths in both experiments were 12 ppm in the direct and 36 ppm in the indirect dimensions. A total of 192 transients was collected on the unlabelled Tat protein whereas 32 transients were collected for the 125

13

C/15 N-labelled

protein. Spectra were apodized using a squared cosine bell function, zero filled to twice (13 C/15 N-labelled) or four times (unlabelled) the data set size, and linear predicted (forwardbackward with eight prediction coefficients) prior to Fourier transformation in the indirect dimension. The dimensions of the resulting processed data sets were 4096×1024 points for both 1 H/15 N-HSQC experiments. Non-linear line shape fitting was performed on the peaks in the spectrum of the unlabelled Tat sample and the noise was subtracted from the result. The HSQC pulse sequence was sensitivity-enhanced and used gradients for coherence selection and water suppression [178]. Radiation damping was suppressed with a water flip-back pulse (1.42 ms).

15

N decoupling during acquisition was done using the WALTZ-16 sequence [192]

at a frequency of 7.2 kHz.

1

H chemical shifts were referenced to the water signal that

resonates 4.82 ppm from 2,2-dimethyl-2-silapentane-5-sulfonate (DSS) at 293 K [169]. and

13

3.8

15

N

C referencing were done indirectly relative to DSS as recommended [193].

NMR Backbone Assignments

All backbone assignment experiments for the 92-residue 13 C/15 N-labelled His-tagged Tat1−72 were done on a 600 MHz Varian INOVA spectrometer (14.1 T) equipped with a triple resonance probe head at 20.2◦ C, using standard Varian BioPack pulse sequences [190, 194– 198] (see Table 3.3). The NMR probe was calibrated with methanol [169] and all spectra were processed with NMRPipe [191]. Spectra were apodized using a squared cosine bell function, zero filled to twice the data set size, and linear predicted (forward-backward with eight prediction coefficients) prior to Fourier transformation. The dimensions of the resulting processed data sets were 4096×1024 for the 1 H/15 N-HSQC experiment and 2048×256×128 for all 3-dimensional experiments. The pulse sequences used are sensitivity-enhanced (with the exception of the HNHA experiment) and use gradients for coherence selection and water suppression [178]. Radiation damping was suppressed with a water flip-back pulse (1.42 126

ms).

15

N decoupling during acquisition was done using the WALTZ-16 sequence [192] at a

frequency of 7.2 kHz. 1 H chemical shifts were referenced to the water signal that resonates 4.821 ppm from 2,2-dimethyl-2-silapentane-5-sulfonate (DSS) at 293 K [169].

15

N and

13

C

referencing were done indirectly relative to DSS as recommended [193]. Table 3.3: Acquisition parameters for the NMR experiments. Experiment

a

a

Scans b

1

H/15 N-HSQC

[190]

1

H/15 N-HSQC [190]

Complex Points

SW[1 H] SW[13 C] SW[15 N] Field (ppm) (ppm) (ppm) (tesla)

192

2048×128

12

36

14.1

32

2048×256

12

36

14.1

HNCACB [194]

16

1024×64×32

12

70

30

14.1

CBCA(CO)NH [195]

8

1024×64×32

10

70

24

14.1

HNCO [196]

8

1024×64×32

10

8

24

14.1

HN(CA)CO [197]

8

1024×64×32

10

8

24

14.1

HNCA [195, 196, 199, 200]

16

1024×64×30

20

20

20

14.1

HNHA [198]

8

1024×64×32

10

10

30

14.1

T1 [178, 201]

8

2048×256

10

24

14.1

T2 [178, 201]

8

2048×256

10

24

14.1

T1ρ [202]

8

2048×256

10

24

14.1

NOE [178]

32

2048×256

10

24

14.1

T1 [178, 201]

4

672×256

15

26

18.8

T1ρ [202]

4

672×256

15

26

18.8

NOE [178]

16

672×256

15

26

18.8

All HSQC and 3D experiments were acquired with a 0.7 s post-acquisition relaxation delay. Relaxation experiments (T1 , T2 , T1ρ , and NOE ) used a 5 s relaxation delay. The saturation period for the NOE experiments was 5 s.

b

Experiment carried out using the natural abundance of the 15 N isotope in the unlabelled Tat sample.

127

Chemical shift differences from a random coil were determined according to the method of Schwarzinger et al. [203] in which experimentally-derived random coil chemical shifts from model pentapeptides (Ac-G-G-X-G-G-NH2 ) under denaturing conditions [204] were subtracted from the observed 1 H,

13

C, and

15

N chemical shifts for His-tagged Tat1−72 . The

random coil values were corrected for local sequence effects as the amide nitrogen, amide proton and carbonyl carbon chemical shifts are very sensitive to the local amino acid sequence (the Cα and Hα are less sensitive). The random coil values are corrected for the effects of the neighbouring residues [203] according to

δcorrected (i) = δrc (i) + ∆δ(i − 1) + ∆δ(i + 1) + ∆δ(i − 2) + ∆δ(i + 2)

(3.1)

where δcorrected is the corrected chemical shift difference for the residue at position i in the sequence, δrc is the experimentally derived random coil chemical shift for residue i in the pentapeptide, and the ∆δ terms are experimentally determined correction factors for the two residues preceding and following residue i. These corrections were applied to the amide HN , amide N, and C’ as well as the Cα and Hα (since correction factors were available) chemical shifts. No sequence dependent correction factors were available for the Cβ chemical shift. The random coil values in [204] and the correction factors in [203] were determined at 293 K. 3

JH N H α coupling constants were determined to a first approximation from the ratio

of the intensities of the cross- and diagonal-peaks in the HNHA experiment [198]. This approximation assumes that the lineshapes of the cross- and diagonal-peaks are identical. The 3 JH N H α coupling constants are then obtained (ignoring relaxation effects) [169] from the relation Icross = − tan2 (3 JH N H α π2δ2 ) Idiag

(3.2)

where Icross and Idiag are the intensities of the cross- and diagonal-peaks respectively, and 128

2δ2 is the re-phasing period—set to 12.5 ms [198]. As the above approximation does not α

consider the effects of longitudinal relaxation of the Hα proton (R1H ) during the 2δ2 period the results obtained under this approximation are likely 5-10% underestimated [169]. The ∆3 JH N H α values were calculated by subtracting the sequence-corrected COIL values reported in [205] from the above approximation for the 3 JH N H α coupling constants1 .

3.9 NMR

NMR Relaxation Measurements 15

N-relaxation data were collected on both

15

N-labelled and

13

C/15 N-labelled His-

tagged Tat1−72 on a Varian INOVA 600 MHz spectrometer (14.1 T field) at the University of Manitoba and on a Varian INOVA 800 MHz spectrometer (18.8 T field) at the University of Alberta (NANUC) with triple resonance probe heads at 20.2 ◦ C, using Varian BioPack pulse sequences [178, 201, 202]. Cross-peak intensities were measured as peak heights. Spectra were processed with NMRPipe [191] which was also used to fit the relaxation data to twoparameter exponential decays. The errors in the relaxation rates were calculated using the signal-to-noise ratios of the individual peaks and the fits of the data to the decays. Duplicate measurements were made to verify the error estimates. A total of nine data sets were acquired to obtain longitudinal relaxation rates (R1 ) using relaxation delays of 0, 50, 100, 250, 500, 1000, 1500, 3000, and 4000 ms. Measurements for longitudinal relaxation rates in the rotating frame (R1ρ ) were made with eight data sets using spin-lock times of 30, 60, 90, 120, 150, 180, 210, and 240 ms. The 15 N spin-lock continuous-wave frequency for the R1ρ relaxation experiments was 1.5 kHz, with 90◦ pulse lengths of 166.755 ms and 125.029 ms for the 14.1 T and 18.8 T fields respectively. The R1ρ measurements were corrected for offset 1

Sequence-dependent effects on COIL 3 JH N H α coupling constants where the preceding residue is β-branched or aromatic (L-type: Phe, His, Ile, Thr, Val, Trp, and Tyr) or is any other residue (S-type). 129

from the carrier using the measured R1 values as described in reference [202]. The peaks at the outer edges of the spectra required correction by less than 10%. Transverse relaxation rate (R2 ) measurements were done at 600 MHz only and with Carr-Purcell-Meiboom-Gill (CPMG) [178] times of 30, 60, 90, 120, 150, 180, 210, and 240 ms. The R1 and R1ρ data were acquired using 4 transients whereas 8 transients were collected for the R2 experiments; the post-acquisition relaxation delay was 5 s. Data collected at 600 MHz were 2048×256 complex points with SW[1 H]=10 ppm and SW[15 N]=24 ppm. The steady-state 1 H-15 N NOE values were obtained from ratios of peak heights from experiments with (IN OE ) and without (InoN OE ) saturation of the protons for 5 s at the beginning of the experiment. The heteronuclear NOE values were then obtained from (IN OE -InoN OE )/InoN OE . The spectra were acquired with 32 transients, a 5 s relaxation delay, and the same resolution as in the R1 and R1ρ experiments. Water suppression was achieved through the use of gradients to select for the

15

N-1 H coherence [178]. Data collection at 800 MHz was done exactly as at 600 MHz

but the resolution for the experiments was 672×256 complex points with SW[1 H]=15 ppm and SW[15 N]=26 ppm and NOE experiments were done with 16 transients.

3.10

Relaxation Data Analysis

The measurement of NMR relaxation rates provides a window on protein dynamics over a broad range of timescales:

15

N longitudinal (R1 ), transverse (R2 ), rotating-frame (R1ρ ), and

heteronuclear cross-relaxation (contained in the NOE) rates are sensitive to dynamics on the picosecond-nanosecond timescales, and R2 and R1ρ can also be sensitive to conformational exchange (Rex ) on the millisecond to microsecond timescales. The equations relating the macroscopic rates of relaxation (Rx ) to the values of the spectral density of motions (J) at the nuclear spin transition frequencies (ω) were given by Abragam [167] and are summarized

130

as follows (see Sections 2.2, 2.3 and 2.4):

R1 =

d2 [J(ωH − ωN ) + 3J(ωN ) + 6J(ωH + ωN )] + c2 J(ωN ) 4

d2 [4J(0) + J(ωH − ωN ) + 6J(ωH ) + 3J(ωN ) + 6J(ωH + ωN )] 8 c2 + [4J(0) + 3J(ωN )] + Rex 6 ) * γH d2 6J(ωH + ωN ) − J(ωH − ωN ) N OE = γN 4 R1

R2 = R1ρ =

(3.3)

(3.4)

(3.5)

The constants d and c in equations (3.3)-(3.10) are defined from (2.132) and (2.164) as 6µ 7 γ γ ! o H N d= 3 4π rN H c=

∆σωN √ 3

where • µo = 4π × 10−7 · kg · m · s−2 is the permeability constant of free space; • γH = 2.68 × 108 · rad · s−1 · T −1 is the proton gyromagnetic ratio; • γN = −2.71 × 107 · rad · s−1 · T −1 is the gyromagnetic ratio of

15

N;

• rN H =102 pm is the proton-nitrogen internuclear separation [206]; • ∆σ =-172 ppm is the difference between the parallel and perpendicular components of the

15

N chemical shift tensor [206];

• ! = 1.05 × 10−34 J · s is Planck’s constant divided by 2π. Since equations (3.3)-(3.5) involve spectral density functions at five distinct frequencies, it will not be possible to evaluate the system of relaxation equations with the limited 131

data set of only three relaxation experiments. At least two additional relaxation equations (and corresponding data sets) would be necessary to unambiguously evaluate the spectral densities at these five frequencies. Peng and Wagner originally proposed full spectral density mapping [181,207] using a number of relaxation experiments equal to the number of distinct frequencies of the spectral density function plus an additional experiment to account for the conformational exchange contribution (Rex ) to R1ρ or R2 . However, it was later found that using the methods of Farrow et al. [207, 208], it is possible reduce the complexity of the system by combining the three high frequency spectral densities into a single spectral density function for J(ωH ) and incorporating the exchange contribution (if present) into an effective J(0) estimate such that

Jef f (0) = J(0) + λRex

(3.6)

where the constant λ is defined as λ=

3d2

6 + 4c2

(3.7)

The result is a system of three equations with spectral density functions at only three frequencies. For this reduced spectral density mapping approach, equations (3.3)-(3.5) can be approximated as follows [91, 208, 209]:

R1 =

R2 = R1ρ

d2 [3J(ωN ) + 7J(β1 ωH )] + c2 J(ωN ) 4

d2 c2 = [4Jef f (0) + 3J(ωN ) + 13J(β2 ωH )] + [4Jef f (0) + 3J(ωN )] 8 6 ) * γH d2 5J(β3 ωH ) N OE = γN 4 R1

(3.8)

(3.9) (3.10)

where β1 = 0.921, β2 = 0.955 and β3 = 0.87. The reduced spectral density approximations in equations (3.8)–(3.10) result in solutions for Jef f (0), J(ωN ) and J(β3 ωH ). The J(βi ωH )

132

term can be approximated in several ways, but for these analyses it has been approximated according to reference [91] by

J(βi ωH ) =

!

β3 βi

"2

J(β3 ωH )

(3.11)

The solution [210] to the system of equations in (3.8)-(3.10) is then "* ) ! 1 18 γN Jef f (0) = 2 N OE 6R1ρ − R1 3 + 3d + 4c2 5 γH

(3.12)

) * 7 γN 4 R1 1 − J(ωN ) = 2 N OE 3d + 4c2 5 γH

(3.13)

J(0.87ωH ) =

4 γN R1 N OE 2 5d γH

(3.14)

Note that R2 and R1ρ are determined by the same combination of spectral density values as long as the

15

N spin-lock is on resonance for all spins [181] (see Section 2.6). J(0.87ωH )

is determined from equation (3.10), and J(0.921ωH ) and J(0.955ωH ) are calculated directly from it using the assumption that at high frequency J(ω) ∝ 1/ω 2 . One advantage to measuring R1ρ is that, in contrast to R2 , contributions from conformational exchange are minimized (Rex ∼ 0) as long as the nitrogen carrier is placed on resonance and the spin-lock power is sufficiently high [211–213]. In the event that conformational exchange contributions are significant, Jef f (0) should be interpreted as a combination of slow motions (i.e., molecular tumbling) and conformational exchange on the µs-ms timescale. The reduced spectral density approach thus allows a direct calculation of J(ωN ) and Jef f (0) (strictly, Jef f (ωe ) the magnitude of the effective field/frequency in the presence of the spin-lock) from the measured relaxation rates and steady-state NOE. Uncertainties in the spectral densities were determined by repeating the calculations 500 times using the standard deviations of the NMR measurements and Monte Carlo methods to generate 133

a normal distribution as described in [214, 215].

The calculations were done using a

Mathematica 5.0 notebook, that I modified from the original form (written and provided by Leo Spyracopoulos [216]), using the program’s built-in simulated annealing protocol [217]; statistical analyses were done with the program JMP IN 5.1 (SAS Institute Inc., Cary, NC). Relaxation measurements were done at two fields to permit finer mapping of the spectral density and more specifically, to test the assumptions inherent in the reduced spectral density analysis. In addition, since Rex scales with the square of the applied magnetic field it is possible to determine the contribution of Rex to equation (3.4) by measuring relaxation parameters at two fields. Thus, Rex and Jef f (0) values were calculated from the relaxation measurements at 600 MHz and 800 MHz as described in [209] using the following relations: ) 1 3d2 800 600 800 600 Jef f (0) = {J(ωN ) − κJ(ωN )} {R1ρ − κR1ρ }− β 8 * c2800 800 600 + {J(0.96ωH ) − κJ(0.96ωH )} (3.15) 2 Rex =

600 R1ρ



!

d2 2c600 + 2 3

"

Jef f (0) −

!

3d2 c2600 + 8 2

"

600 J(ωN )−

13d2 600 J(0.96ωH ) 8

(3.16)

where the fields are denoted by their proton Larmor frequency in the superscripts and 800 600 2 subscripts, κ = (ωH /ωH ) , β = (d2 /2)(1 − κ), d is defined as above since it is field

independent, ci is the constant c from equations (3.8) and (3.9) evaluated with ωN for field 600 800 i. In this analysis, the longitudinal relaxation rates in the rotating frame (R1ρ and R1ρ )

have been used instead of the transverse relaxation rates (used by Farrow et al. [209]) in equations (3.15) and (3.16). The R1ρ relaxation data were modelled by assuming that the effect of its neighbours (j ) on the correlation time of a residue (i) decreases exponentially as the distance from the

134

residue increases and was first described in [211]:

R1ρ (i) =

int R1ρ

N (

!

|i − j| V exp − L j=1

"

(3.17)

int where R1ρ is an intrinsic residue relaxation rate, N is the length of the polypeptide, V is

the residue molecular volume [218], and L is the persistence length of the polypeptide in residues. A different solution to equations (3.3)-(3.5) was proposed by Lipari and Szabo [175,176] who derived a simplified spectral density function J(ω)LS on the assumption that global molecular reorientation (τc ) and fast internal motions (τe ) are stochastically uncorrelated [219]: J(ω)LS

* ) 2 (1 − S2 )τ S2 τc = + 5 1 + (ωτc )2 1 + (ωτ )2

(3.18)

where 1/τ = 1/τc + 1/τe . The Lipari-Szabo model-free spectral density reduces the number of unknown parameters in equations (3.3)-(3.5) to three: S2 , the square of the generalized order parameter which indicates the degree of spatial freedom of the internal motion; τc , the global rotational correlation time for molecular reorientation; and τe , the effective internal rotational correlation time which is related to both the amplitude and the rate of internal motion. The separability of internal and overall dynamics is questionable for a random coil polymer but comparisons of the Lipari-Szabo parameters to those obtained for other folded and unfolded proteins can be informative. Relaxation data were analysed using the approach developed by Schurr et al. [220] in which all three Lipari-Szabo parameters are optimized for each residue individually, as this is reported to provide a significantly better fit to the NMR data [220] . The analysis was initially carried out using the simple model in (3.18), but additional models were tested using variations of the extended model-free approach [221] and the Cole-Cole distribution [222–224].

135

The extended model-free or two-timescale method proposed by Clore et al. [221] separates the correlation time for internal motions, τe , into fast (τf ) and slow (τs ) components. In this work [221], it was found that the time evolution of the internal reorientational correlation function—CI (t) in (2.197)—probed by NMR was non-exponential when the slow motions were not at the extreme narrowing limit. The proposed solution to describing this behaviour is an expression for the internal correlation function of the form CI (t) = S2 + Af e−t/τf + As e−t/τs

(3.19)

S2 + Af + As = 1

(3.20)

with

If τf and τs differ by at least one order of magnitude, then CI (t) will tend towards an intermediate plateau before reaching a final plateau at S2 . Clore et al. [221] suggest with such a separation of timescales, the term 1 − Af could be interpreted as the generalized order parameter for fast motions, denoted S2f . If it is then assumed that the fast motions are axially symmetric and independent of the slow motions, the generalized order parameter can be decomposed into two independent components as S2 = S2f S2s

136

(3.21)

The extended model-free spectral density2 can be then be expressed as

J(ω)ext

$

(S2f − S2 )τs$ (1 − S2f )τf$ 2 S τc = + + 5 1 + (ωτc )2 1 + (ωτs$ )2 1 + (ωτf$ )2 2

' (3.22)

$

S2f S2s τc S2f (1 − S2s )τs$ (1 − S2f )τf$ 2 + = + 5 1 + (ωτc )2 1 + (ωτs$ )2 1 + (ωτf$ )2

'

where S2s is the generalized order parameter for slow motions (equivalent to 1 − As ), 1/τs$ = 1/τc + 1/τs and 1/τf$ = 1/τc + 1/τf . The extended model-free approach necessitates the use of multiple field measurements or measurements of more than three relaxation rates at the same field (although it is the former that is most often done) since the three relaxation relations in equations (3.3)-(3.5) are not sufficient alone to make estimates of more than three dynamics parameters. Another approach suggests that for unfolded or disordered proteins, a single local overall rotational correlation time, τc , is not an appropriate description of the dynamics [223–225]. Disordered or denatured proteins consist of an ensemble of rapidly converting conformational states at the nanosecond timescale, and the dynamics at each residue should reflect that ensemble of conformations. The individual residues along the disordered protein may be more appropriately described by a statistical distribution of correlation times on the nanosecond timescale [223–225]. There have been two approaches to this modification: one is to assume that the distribution of correlation times is Lorentzian [225], and the other is to assume that the correlation times follow the Cole-Cole distribution (see below) [222–224]. For the relaxation data in the present analysis, the Cole-Cole distribution was chosen to estimate 2

Note that the relation for the extended model-free spectral density described here differs from that in [221] by a factor of 2/5. The factor has been included here to be consistent with the way the spectral density function has been defined in Chapter 2.

137

the overall rotational correlation times as it was more easily implemented in calculations due to its similarity to the standard Lipari-Szabo spectral density. The Cole-Cole distribution function [222–224] is defined as

F (s) =

1 sin(επ) 2π cosh(εs) + cos(επ)

(3.23)

where s = ln(τc /τ0 ), τ0 is the centre of the distribution and ε defines the width of the distribution ( 0 < ε < 1). The resulting spectral density function based on the Cole-Cole distribution is applied to the model-free formalism to obtain the Cole-Cole spectral density function [223, 224] $ ' 8 9 S2 ω ε−1 τ0ε sin π2 ε 2 (1 − S2 ) τ 8 9+ JCC (ω) = 5 1 + (ωτ0 )2ε + 2(ωτ0 )ε cos π2 ε 1 + (ωτ )2

(3.24)

The distribution width is 1 − ε and τ0 is the centre of the distribution. In the event that ε = 1 (i.e., zero width), the Cole-Cole spectral density equation reduces to the Lipari-Szabo relation in equation (3.18) and τ0 becomes equivalent to τc . Using a program, written with Mathematica 5.0, based on the single field versions in reference [216], a series of models were tested that utilized the single-field relaxation data alone, as well as two-field data. For the extended and Cole-Cole models, the two-field data were required due to the number of parameters. In several of the tested models, an additional parameter was added to the transverse relaxation rate in equation (3.4), corresponding to the conformational exchange rate [226]. d2 [4J(0) + J(ωH − ωN ) + 6J(ωH ) + 3J(ωN ) + 6J(ωH + ωN )] 8 c2 + [4J(0) + 3J(ωN )] + Rex . 6

R2 = R1ρ =

138

(3.25)

Rex is the field-dependent exchange rate and is defined as [91] Rex = Φex B02 .

(3.26)

where Φex is the field-independent contribution to the exchange rate. The errors in the Lipari-Szabo and Cole-Cole parameters were determined by Monte Carlo analysis as described above for the spectral density analysis, except that only 100 points were calculated [214, 215]. The tested models (Table 3.4) were evaluated based on both R-factors (Rf ) [227] as well as the Akaike information criterion (AIC) as described in [228] and which is based on the χ2 test statistic and the number of parameters being optimized. The form of the χ2 error function is taken from [229] and is defined as $! " ! " n ( N ( R1(i,j) (calc) − R1(i,j) (exp) 2 R1ρ(i,j) (calc) − R1ρ(i,j) (exp) 2 2 χ = + δR1(i,j) δR1ρ(i,j) j i ! "' N OE(i,j) (calc) − N OE(i,j) (exp) 2 (3.27) + δN OE(ij) where i is the residue index, N is the number of residues, j is the field index and n is the number of fields. The terms exp and calc refer to the experimental and back-calculated (from model estimates) values for the relaxation parameters, respectively. δX (X being either R1 , R1ρ or N OE) is the estimated error in the relaxation parameter—either experimental or estimated from Monte Carlo simulation. The AIC value for a given model is then χ2 + 2p where p is the number of parameters being optimized.

139

Table 3.4: Models tested using Lipari-Szabo and Cole-Cole model-free methods model

a

model type

optimised parameters

fixed parameters

Field 14.1 T

model 1

LSb

S2 , τc

18.8 T

τe = 0; Rex = 0

14.1 T, 18.8 T 14.1 T model 2

LS

S2 , τc , τe

18.8 T

Rex = 0

14.1 T, 18.8 T

a

model 3

LS

S2 , τc , τe , Rex

14.1 T, 18.8 T

model 4

LS(ext)c

S2f , S2s , τc , τs

τf = 0; Rex = 0

14.1 T, 18.8 T

model 5

LS(ext)

S2f , S2s , τc , τs , τf

Rex = 0

14.1 T, 18.8 T

model 6

LS(ext)

S2f , S2s , τc , τs , τf , Rex

model 7

CCd

S2 , τ0 , τe , ε

model 8

CC

S2 , τ0 , τe , ε, Rex

14.1 T, 18.8 T Rex = 0

14.1 T, 18.8 T 14.1 T, 18.8 T

Models 1 and 2 can be used with relaxation measurements at a single field but models 3–8 require measurements from at least two fields.

b

LS denotes the per residue model-free variation of the Lipari-Szabo spectral density proposed by Schurr et al. [220]

c

LS(ext) denotes model-free estimates using the Clore et al. [221] variation of the LipariSzabo spectral density for fast and slow internal motions in eq. (3.22)

d

CC denotes model-free estimation using the Cole-Cole spectral density function proposed by Buevich et al. [223, 224] given in eq. (3.24) Models were selected initially based on R-factors (Rf ) such that models with reasonably

low Rf values (0.15 or less) were then used for determining the Monte Carlo error estimates. The best model was then selected from a reduced set of models with (Rf < 0.15) based on 140

the mean AIC ± the standard deviation for the model.

3.11

pH and Hydrogen Exchange

The dialysate from a

13

C/15 N-labelled His-tagged Tat1−72 preparation was separated into

to two equal portions and freeze-dried separately. One half of the freeze-dried protein was dissolved in 550 µL of a degassed aqueous solution of 80 µM sodium sulfite, 0.02% sodium azide and 5% D2 O (i.e., no buffer). The resulting protein solution was at pH 3.3 and was then added under an argon atmosphere to an NMR sample tube. Subsequently, a 1 H/15 N-HSQC spectrum was collected under the same conditions as described previously. Following the acquisition of the HSQC spectrum, a 50 µL aliquot of 0.6 M degassed MES buffer at pH 6 was added to the sample tube under an argon atmosphere. The pH of the resulting solution was quickly measured under ambient atmosphere and then the sample was degassed. The resulting solution was at pH 5.3. A second 1 H/15 N-HSQC spectrum was collected and a small aliquot of degassed 1.0 M sodium hydroxide was added. The pH of the resulting solution under ambient atmosphere was determined to be at pH 5.8 and then the sample was degassed. Following acquisition of a 1 H/15 N-HSQC spectrum of the pH 5.8 protein solution, another small aliquot of degassed 1.0 M sodium hydroxide was added and the new pH was measured (under ambient atmosphere) to be 6.7 and then degassed. A final 1 H/15 N-HSQC spectrum was collected. At pH 6.7, the protein solution was now at the limit of the effective buffering range of MES (pH 5.5-6.7). All pH measurements were done under ambient atmospheric conditions as quickly as possible and the sample was degassed immediately following the pH measurement in order to minimize the chance of oxidation of the protein. The resulting 1 H/15 N-HSQC spectra (at pH 3.3, 5.3, 5.8 and 6.7) were analysed along with the pH 4.1 spectra (described previously) and the peak heights and widths were tabulated. In order to determine if the observed losses in peak intensity with increasing 141

pH are a result of the increasing rates of hydrogen exchange with the water solvent, the theoretical hydrogen exchange rates for unfolded Tat at pH values of 3.3, 4.1, 5.3, 5.8 and 6.7 were calculated taking into account the nearest neighbour inductive and steric effects [230]. Predicted hydrogen exchange rates were determined using a Microsoft Excel spread sheet provided by Walter Englander, University of Pennsylvania School of Medicine (HX2.med.upenn.edu). The spread sheet determines the intrinsic hydrogen exchange rates for a protein in a fully opened conformation (unprotected) at any pH and temperature as well as including the influence of neighbouring side-chains.

142

Chapter 4 Results 4.1

Protein Expression and Purification

Growth of E. coli BL21(DE3)pLysS cells containing pET28tat in TB typically yielded about 10 g of cells (wet weight) per litre of TB medium. Yields of E. coli are reduced by about half when the cells are grown in 1 L of

13

C/15 N labelling medium (M9) using cells from

4 L of TB, as described in Chapter 3. Typically, in both unlabelled and labelled protein purifications, the protein dialysate is free of visible precipitate. UV absorbance measurements of the protein dialysate at 280 nm (calculated 1280 = 9090 cm−1 ×M−1 [231]) were used to determine protein yields of His-tagged Tat. Typically, up to 20 mg of unlabelled protein and 15 mg of 13 C/15 N-Tat1−72 are recovered from cells grown in 1 L of TB and minimal medium, respectively. Attempts to remove the hexahistidine (6×His) affinity tag followed by re-purification on the metal affinity column resulted in a significant loss of protein. Possible reasons for the problems associated with thrombin cleavage are that the protein contains a potential internal thrombin cleavage site (see Fig 4.1) between Lys-61 and Ala-62 [232] and the protein

143

contains a possible thrombin inhibitory segment Arg-Pro-Pro (residues 76–78) [233]. NMR analysis [166] (below) suggests that there is no interaction between the affinity tag and any other segment of the protein and this has been generally found to be the case for a large number of proteins containing polyhistidine purification tags [234].

1 20 | | MGSSHHHHHH SSGLVPRGSH

21 40 | | MEPVDPRLEP WKHPGSQPKT

41 60 | | ACTNCYCKKC CFHCQVCFIT

61 80 | | KALGISYGRK KRRQRRRPPQ

81 92 | | GSQTHQVSLS KQ

Figure 4.1: Amino acid sequence of His-tagged Tat1−72 . The affinity tag residues are shown in normal face and Tat residues are shown in bold face.

4.2

Monomer Identification: MALDI-TOF-MS

As indicated in Figure 4.2, mass spectrometry, and in particular, MALDI-MS, is an effective approach to ascertaining both the purity and the oligomeric state of the protein. A significant advantage of MALDI-MS over sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) is that the former method can maintain the protein at a low pH where the cysteine residues are protonated and unreactive. The MALDI-TOF mass spectrum shown in Figure 4.2 indicates that there is one predominant peak at 10,519.8 Da corresponding to the [M+H]+ species for the unlabelled His-tagged Tat monomer (calculated MW 10,509.076 Da). A second, less intense peak at 5256.5 Da, is likely the [M+2H]2+ peak. These two peaks corresponding to the monomeric Tat protein make up 89% of the total intensity of the non-matrix related peaks. The additional weak peaks at 7376.4, 21028.7 and 31345.6 Da are most likely the [2M+3H]3+ , [2M+H]+ , and [3M+H]+ species, respectively. Similar peaks are often observed in MS and are usually ascribed to non-covalent protein oligomer 144

formation mediated by interactions between basic residues (Arg, Lys, and His) and acidic residues (Asp and Glu) in proteins [235]. The low intensity of these peaks in the present spectrum may be explained by the high net positive charge on Tat at low pH suggesting that there is minimal Coulombic attraction between the proteins. 100 [M+H]+

50

[M+2H]2+ [2M+3H]3+

[2M+H]+

[3M+H]+

0 20000

40000

m/z

60000

80000

100000

Figure 4.2: MALDI-TOF-MS identification of monomeric unlabelled His-tagged Tat1−72

4.3

NMR Spectroscopy and Resonance Assignments

Dissolution of freeze-dried protein dialysate at pH 4 usually yielded solutions free of visible precipitate and free of suspended material based on the near UV absorption spectra. The natural abundance 1 H/15 N-HSQC spectrum of unlabelled Tat1−72 shown in Figure 4.3 shows 145

64 of the 83 observable amide backbone resonances (non-proline and non-N-terminal) as well as 9 peaks corresponding to the Arg, Gln, and Asn side chain resonances. The 1 H/15 NHSQC spectrum of the

13

C/15 N-labelled Tat1−72 in Figure 4.4 shows the same peaks as

in the unlabelled protein and backbone resonances missing from the unlabelled sample, as well as some additional weaker resonances that correspond to backbone residues that are undergoing slow conformational exchange (ms-s range). In general, both samples show crosspeaks regionally clustered in a manner typical of denatured or disordered proteins: a Gly region, a Ser/Thr region and a region containing the rest of the backbone amides [204]. The spectral dispersion of the resonances is also typical for proteins lacking regular secondary structure in that all backbone resonances lie within a 1.1 ppm range in the 1 H dimension and within 20 ppm in the

15

N dimension [236, 237].

146

109 110 111 112 113 114 115 116

118 119

N (ppm)

117

120 121 122 123 124 125 126 127 128 129 8.8

8.6

8.4

8.2

8.0

7.8

7.6

7.4

HN (ppm)

Figure 4.3: Amide backbone region of a 1 H/15 N-HSQC spectrum (192 scans) for naturally abundant

15

N in unlabelled His-tagged Tat1−72 acquired on a Varian INOVA 600 MHz

spectrometer at pH 4.1 and 293 K.

147

129

G64

109

G35 G68

G18 G81

110

130

111

G13

131 10.0

10.4

112 113 114

T84

115

T40

S19

S36

S3

S82

116

T43

118

S12

H7 H8 H9 H10

W31 S11 H33 H6 C42 S66 C45 K39 H5 T60 S88 H85 H53

H20

119 I65

V24 R69

120

L28

E29

C50 Q80 C54 N44 C47 C51L63 R27 K32 Q37 L14 V56 R17 V87 Q86 K70 Y46 R75 Y67 K71 Q74 R73 Q83 K49 V15 I59 C57 K91 R76 R72

121 122

K48

123 124

F52/F58 E22 K61

125

L89 D25

N (ppm)

117 S90

S4

A62

126

Q92

A41

127

Q55

128 129 8.8

8.6

8.4

8.2

8.0

7.8

7.6

7.4

HN (ppm) (a)

Figure 4.4: (a) 1 H/15 N-HSQC spectrum of

13

C/15 N-labelled His-tagged Tat1−72 at pH 4.1

and 293 K recorded on a Varian INOVA 600 MHz spectrometer. Backbone amide region with assignment of 80 of the 83 non-proline and non-N-terminal resonances (side-chain Asn and Gln NH2 resonances are outlined with a solid ellipse and the side-chain amide of Arg resonances are outlined in a dashed ellipse). Inset region shows the three peaks associated with the side chain of the single Trp residue (Trp-31). (b) Expanded region of (a) in dashed rectangle. Cys residues in (a) and (b) are shown in bold face. 148

S88

K39

120.0

H5

H20

T60 C50

H85

V24

H53

R69

120.5

121.0

N44

Q80

C47

C54

121.5

C51 R27

K32

Q37

R17

L14

L63

V87 Q86

122.5

Y46 Q74

122.0

V56 K70

R75

K71

Y67

123.0

K49

R73 Q83

F52/F58

V15

123.5 I59 R76

8.6

R72

8.5

K91

8.4 8.3 N H (ppm) (b)

Figure 4.4: continued

149

8.2

124.0 8.1

The observed chemical shifts of the cross-peaks do not differ significantly between the unlabelled (Fig. 4.3) and

13

C/15 N-labelled (Fig. 4.4(a)) samples, indicating that the

proteins are in the same conformational state. Peaks that are missing in the spectrum of the unlabelled protein correspond to those that are of relatively weak intensity in the 13

C/15 N-labelled protein and are therefore absent due to the sensitivity limitations of the

natural abundance experiment. Many of the missing peaks in the natural abundance HSQC spectrum correspond to amide backbone resonances in the Cys-rich and core regions of the protein [166]. These resonances are the weakest in the spectrum and in some cases are associated with multiple cross-peaks observed in the HSQC spectrum of the 13 C/15 N-labelled Tat. The multiple cross-peaks may indicate conformational exchange on the µs-ms timescale in these regions indicating transient structural formation, which may only become stabilized in the presence of zinc ions, binding to TAR, cyclin T1, or other binding partners. In addition to the chemical shift dispersion of resonances, the 1 H/15 N-HSQC spectra of Tat in Figs. 4.3 and 4.4 show that the peaks exhibit a range of intensities with nearly all the weak and medium intensity cross-peaks falling in the sequence between Cys-47 and Leu-63, as seen in the intensity profile depicted in Figure 4.5. The weakness in intensity in this range of the sequence suggests that this region of the protein is likely undergoing conformational exchange on the ms-µs timescale—indicating possible transient structure formation.

150

Relative Intensity

1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue Figure 4.5: Relative intensities of the amide backbone resonances from a 1 H/15 N-HSQC spectrum of His-tagged Tat1−72 at pH 4.1 and 293 K. The solid horizontal line denotes the mean relative intensity of 0.42 (s.d.=±0.26). The 1 H/15 N-HSQC spectrum in Figure 4.4 reveals 40 additional cross-peaks (mostly very weak) than can be accounted for by the backbone and side-chain atoms of a 92-residue protein. Seventeen residues have multiple cross-peaks which have been sequentially assigned (indicated with the designation ‘a’ in Table A.1, Appendix A); some of the peaks were assigned to amino acid identity only, and some could not be unambiguously assigned (see Table A.2 in Appendix A). Thirteen of the unambiguously assigned minor resonances fall in the region spanned by residues Cys-45 to Arg-69. One example of the multiplicity of cross-peaks is shown for the single Trp at position 31 (see Fig. 4.4, inset), which exhibits one strong and two weaker side-chain indole amine cross-peaks. The Trp is preceded by a Pro, so one of the two minor peaks could arise from Trp bonded to a Pro cis-peptide bond isomer, but this is unlikely (see Ch. 5) . Another possible explanation is that some of the minor peaks are due to the presence of minor amounts of oxidized Cys residues that are unobserved in the lower-resolution three-dimensional experiments. Another interesting example of peak multiplicity is Gly-64, which exhibits two amide 151

cross-peaks of approximately equal intensity (Figures 4.3 and 4.4). The nearest proline to Gly-64 is separated from it by 14 residues ruling out cis-trans proline isomerism as an explanation of the peak multiplicity. This suggests that some segments of the reduced, monomeric Tat protein exist in multiple conformations that are in slow equilibrium on the chemical shift time scale (ms-s). In the case of Gly-64, the two resonances have comparable intensity suggesting equal populations of two conformers whereas in many other cases one resonance is significantly more intense than those arising from alternate conformers suggesting one dominant conformer and minor alternates. Gly-64 is located between Leu and the β-branched Ile and it is possible that steric crowding could locally restrict the dynamics of the Gly amide. Since there is a variation in the intensities of the duplicate peaks across the sequence this immediately suggests that the conformers populated arise from local interactions, as expected in a disordered protein. Sequential assignments of 1 HN , 15 N, C’, Cα , Hα , and Cβ , resonances were done entirely with 3D heteronuclear triple resonance experiments that use one- and two-bond scalar couplings to connect the atoms [169]. These experiments take advantage of the comparatively wide chemical shift dispersions of 15 N and 13 C resonances in unfolded proteins [238,239]. All the backbone resonances were sequentially assigned except for Met-1, Met-21, Phe-52, Phe58, Arg-77, and Pro-78. The Met-1 amino and Gly-2 amide protons exchange too rapidly to be observed. Resonances from Phe-52/58 could be assigned to residue type only, because they are both preceded by weak Cys resonances. Arg-77 and Pro-78 are part of the difficult sequence Arg-Arg-Arg-Pro-Pro and could not be unambiguously sequentially assigned. Of the assigned Pro resonances, all but one have Cβ chemical shifts characteristic of the trans peptide. The Cβ shifts of Pro-38 are outside the canonical chemical shifts for the trans configuration [204] but are nearer to the trans than they are to the cis configuration. As an example, parts of the HN(CA)CO and HNCACB experiments used for the backbone assignment of residues Gln-83–Leu-89 are shown in Figure 4.6. The assignments are listed 152

in Table A.1 in Appendix A.

(a)

N(ppm) 114.83 120.63 122.54 122.32 119.76 125.07

(b)

172

N(ppm) 114.83 120.63 122.54 122.32 119.76 125.07 10 20

H85

30

T84

176 Q83

Q86

C!C"(ppm)

S88

CO(ppm)

174

V87

40 50 Q83

L89 S88 Q86

60 T84

V87

70

L89

178

H85

80 8.150 8.545 8.499 8.358 8.456 8.443 HN(ppm)

8.150 8.545 8.499 8.358 8.456 8.443 HN(ppm)

Figure 4.6: Strip plots extracted from three-dimensional, amide-detected heteronuclear NMR experiments for backbone assignment. Inter- and intra-residual correlations are obtained from (a) an HN(CA)CO [197] spectrum correlating HN (i) and N(i) with C’(i) and C’(i-1) resonances; and (b) an HNCACB [194] spectrum correlating HN (i) and N(i) with Cα (i), Cα (i1), Cβ (i), and Cβ (i-1) resonances. A segment of the His-tagged Tat1−72 at pH 4.1 and 293 K is shown depicting connectivity between residues 83–89. Correlations in (a) are shown with long dashed lines; in (b) the Cα correlations are connected with solid lines and correlations of Cβ are connected with short dashed lines. Both experiments were recorded on a Varian INOVA 600 MHz spectrometer. The assignments of the Cys residues (shown in bold-face in Fig. 4.4) are particularly informative as they confirm that all of the Cys residues are reduced; all of the Cα and 153

Cβ chemical shifts shown in Figure 4.7 observed in the three-dimensional HNCACB [194] and CBCA(CO)NH [195] spectra, are in the range of the random coil chemical shifts of reduced cysteine (58.6 ppm and 28.3 ppm) [204] differing significantly from those of oxidized cysteine (55.6 ppm and 41.2 ppm) [204] involved in disulfide bond formation. The chemical shift resonances for the Cys residues thus confirm the findings from the MALDI-TOF-MS analysis that the protein is in the reduced monomeric state and that the weak peaks in the mass spectrum of Figure 4.2 most likely indicate the presence of non-covalent oligomers formed during the MS analysis.

154

Figure 4 15N(ppm) 119.25 (i-1) C45

119.0 (i-1) C42

121.21 (i-1) C47

120.68 (i-1) C50

121.71 (i-1) C51

121.31 (i-1) C54

123.72 (i-1) C57

10 C"(i-1)

15 20

C"(i-1)

25 C"(i)

C"(i)

C"(i)

C"(i)

C"(i-1)

C"(i-1)

30 35 40

C"(i)

C"(i)

C"(i)

C"(i-1)

C"(i-1)

C"(i-1)

45 C!(i-1)

C!(i-1)

50

C!(i-1) C!(i-1)

55 C!(i)

C!(i)

60

C!(i)

C!(i)

C!(i) C!(i-1)

C!(i-1)

C!(i)

C!(i)

C!(i-1)

65 70 75 80 85 8.416 8.416

8.252 8.252

8.116 8.116

8.403 8.403

8.382 8.382

8.412 8.412

8.664 8.664

1H(ppm)

Figure 4.7: Strip plots for cysteine residues from 3D HN -detected HNCACB [194] and CBCA(CO)NH [195] spectra of the

13

C/15 N-labelled His-tagged Tat1−72 . The HNCACB

spectrum correlates each amide HN (i) with its attached N(i) and the Cα and Cβ of the (i) and (i-1) residues. The corresponding strips from the CBCA(CO)NH spectrum correlate each amide HN (i) with its attached N(i) and the Cα and Cβ of the preceding (i-1) residue only. Both spectra were recorded on the same sample at pH 4.1 and 293 K on a Varian INOVA 600 MHz spectrometer.

155

4.4

Chemical Shifts and 3JH N H α Coupling Constants

The NMR chemical shift is a sensitive indicator of conformation, and assignment of backbone chemical shifts permits an analysis of secondary structure by comparison to random coil values corrected for local sequence effects [203, 240]. Consensus multinuclear (C’, Cα , Cβ , and Hα ) chemical shift indexing (CSI) [241] (data not shown) suggests that the reduced Tat protein at pH 4.1 exists in a random coil conformation. Only three residues (Cys-54, Cys-55, and Cys-56) indicate a tendency towards α-helical conformation but since one turn of an α-helix consists of 3.6 amino acids, at least four consecutive residues are required for identification of an α-helix [7]. It is also possible that these three residues constitute a short ‘turn’ or a nascent helix [203]. However, the consensus CSI calculations are not corrected for local sequence effects. Examination of the individual chemical shift difference plots shown in Figure 4.8, corrected for the sequence effects on the chemical shifts [203, 204], indicates that a majority of the resonances are within the random coil range, and that rarely are there more than 3 consecutive resonances in either the α-helix or β-sheet chemical shift ranges. However, among the HN (Fig. 4.8(a)) and Hα (Fig. 4.8(d)) resonances there appears to be a slight weighting of the conformations toward the α-helix, the most consistent classification being for the segment around Glu-29. Unlike some other denatured proteins [211, 242, 243] there is less evidence of a tendency to the β-sheet conformation perhaps because of a lack of hydrophobic β-branched amino acids in Tat1−72 (two Ile and four Val). The conclusions based on the chemical shift differences are supported by the uncorrected (for Hα relaxation) 3 JH N H α measurements which are all in the range of 5.5–7.1 Hz with a mean value of 6.7 characteristic of unfolded molecules [244]. The results are shown in Figure 4.8(g) in which the differences of the measured values from random coil values corrected for sequence effects of the preceding residue (β-branched or aromatic) according to Penkett et al. [205] and Smith et al. [244] are presented (Gly and Pro residues are omitted). They show 156

that the entire polypeptide is undergoing rapid sampling of the α-helix and β-sheet regions

!HN (ppm)

of Ramachandran space with a slight preference for the α-helix in most segments.

1.5 1

"

0.5 0 -0.5 -1

#

-1.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

Figure 4.8: Chemical shift difference plots of: (a) HN , (b) N, (c) C’, (d) Hα , (e) Cα and (f) Cβ . (g) Difference plot of the 3 JH N H α coupling constant from random coil coupling constant (corrected for sequence effects of the preceding residue). The random coil values for HN , N, C’, Cα , and Hα , have been adjusted for sequence dependence [203] (correction factors for Cβ are unavailable). Reference lines in plots (a)–(f) correspond to thresholds where chemical shift differences reflect secondary structure formation. The plot ranges in (a)–(f) correspond to two standard deviations from the mean value determined from the chemical shift tables in the BioMagResBank database (URL: www.bmrb.wisc.edu).

157

!N (ppm)

8 6 4 2 0 -2 -4 -6 -8

"

# 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue

!C’ (ppm)

(b)

4 3 2 1 0 -1 -2 -3 -4

"

# 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

Figure 4.8: continued

158

!H" (ppm)

1

"

0.5 0 -0.5

#

-1

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (d)

!C" (ppm)

4

"

2 0 -2

#

-4 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (e)

Figure 4.8: continued

159

!C" (ppm)

4 3 2 1 0 -1 -2 -3 -4

"

non-" 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue !3JHNH" (Hz)

(f)

4

#

2 0 -2

"

-4 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (g)

Figure 4.8: continued

160

Since there is a lack of long range homonuclear 1 H-1 H NOEs for Tat, it is not possible to obtain a high resolution structure of the protein. However, it possible to obtain an initial representation of the extended structure of Tat using the THRIFTY web server [245] which generates a PDB structure file of atomic coordinates based on the measured backbone chemical shifts. The resulting PDB structure file can provide an estimate or ‘snapshot’ of the state of the protein through comparison of its chemical shifts to those observed for other proteins in the PDB. A space-filled model based on the THRIFTY generated PDB file for His-tagged Tat1−72 is shown in Figure 4.9. The resulting model shows Tat to be in an extended disordered state and containing a few short turns at residues Pro-30 to Lys-39 and Cys-42 to Lys-48.

Figure 4.9: Space-filled model of the extended disordered form of His-tagged Tat1−72 at pH 4.1 and 293 K, determined using the backbone chemical shifts and the THRIFTY web server [245] generated PDB structure file (left-to-right N-terminal to C-terminal). Regions of the protein are coloured according to: His-tag (residues 1-20) in light grey; acidic or Pro-rich region (residues 21-41) in red with the exception of Glu-29 (dark grey) and Trp31(olive); Cys-rich region (residues 42-57) in yellow; core (residues 58-67) in purple; basic region (residue 68-77) in blue; and C-terminal Gln-rich region (resides 78-92) in orange. Model generated using MacPyMol molecular graphics system, version 0.99 [246]. 161

4.5

NMR Relaxation

Relaxation data (R1 , R1ρ and heteronuclear steady-state NOEs) were measured for 64/83 observable (non-proline and non-N-terminal) resonances at 600 MHz (60/83 at 800 MHz). Sample spectra for the saturation and no-saturation steady-state heteronuclear NOE are shown for His-tagged Tat1−72 in Figure 4.10. The steady-state 1 H-15 N NOE values were obtained (as described in Section 3.9) from ratios of peak heights from experiments with (IN OE ) and without (InoN OE ) saturation of the protons for 5 s at the beginning of the experiment. The heteronuclear NOE values were then obtained from (IN OE -InoN OE )/InoN OE . As indicated in Figure 4.11(a), the steady-state heteronuclear NOEs measured at 600 MHz and 800 MHz exhibit a relatively featureless, flattened bell-shaped variation with amino acid sequence, as expected for an unfolded protein [247]. The observed NOEs range from -3.3 (-2.6) to -0.60 (-0.41) with mean values of -1.27 (-0.933) at 600 MHz (and 800 MHz). For comparison, an average NOE of about -0.2 is observed for several folded proteins with similar lengths of polypeptide chain [248–250]. The more negative NOE values for Tat indicate much less restricted dynamics on the ns-ps timescales than for folded proteins. The ends of Tat exhibit the most negative values indicative of faster dynamics and the values gradually increase away from the C-terminus whereas the increase away from the N-terminus is steeper. Significant deviations from the average values are observed for Thr-43, Lys-61 and Ala-62.

162

108 109 110 111 112 113 114 115 116

118 119

N (ppm)

117

120 121 122 123 124 125 126 127 128 129 8.8

8.6

8.4

8.2

8.0

7.8

7.6

130

HN (ppm) (a) noNOE

Figure 4.10: Sample spectra for the steady state heteronuclear 1 H-15 N NOE of His-tagged Tat1−72 at pH 4.1 and 293 K. (a) no saturation period (noNOE) and (b) 5 s saturation period (NOE). Spectra were recorded on a Varian INOVA 600 MHz spectrometer. 163

108 109 110 111 112 113 114 115 116

118 119 120 121 122 123 124 125 126 127 128 129 8.8

8.6

8.4

8.2

8.0

7.8

HN (ppm) (b) NOE

Figure 4.10: continued

164

7.6

130

N (ppm)

117

1

1

H-15N NOE

0 -1 -2 -3 -4 -5 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

R1 (s-1)

3

2

1

0

0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Figure 4.11: Relaxation measurements of the His-tagged Tat1−72 protein at pH 4.1 and 293 K, determined at 14.1 T (0) and 18.8 T (#) field strengths for: (a) heteronuclear NOE, (b) longitudinal relaxation, R1 , (c) rotating-frame longitudinal relaxation, R1ρ , and (d) R1ρ data at 14.1 T field strength (0) plotted along with the predicted behaviour for a random-coil polymer of uniform composition (—) and the variation in relaxation when residue contributions are weighted by residue volume (- - -) [211, 218].

165

10

R1! (s-1)

8 6 4 2 0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

10

R1! (s-1)

8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (d)

Figure 4.11: continued

166

The R1 measurements show a similar bell-shaped profile with two notable features (Fig. 4.11(b)). The R1 values range from 0.75 s−1 (0.89 s−1 ) to 1.91 s−1 (1.92 s−1 ) with mean values of 1.45 s−1 (1.43 s−1 ) at 14.1 T (at 18.8 T) field strength. Several folded proteins of similar size show slightly higher average R1 values on the order of 1.5 s−1 at 14.1 and 17.6 T field strengths [249, 251]. The slower relaxation in Tat indicates a shorter rotational correlation time and faster dynamics on the ns-ps timescale than for a folded protein. Similar to the NOE values, the R1 rates decline near the ends of the protein and more steeply at the Nterminus than the C-terminus. The lowest rates, apart from the termini, are found in the segment connecting the Cys-rich and basic regions, between Thr-60 and Ser-66 and suggest fast dynamics there. An example of the T1 and T1ρ relaxation series spectra are shown in Figure 4.12 along with examples of the exponential fits of of the relaxation times for Gly-68 in Figures 4.13 and 4.14.

167

107 108 109 110 111 112 113 114 115 116

118 119

N (ppm)

117

120 121 122 123 124 125 126 127 128 129 130 8.8

8.6

8.4

8.2

8.0

HN (ppm)

7.8

7.6

(a)

Figure 4.12: Sample spectra for (a) T1 (50 ms relaxation time) and (b) T1ρ (30 ms relaxation time) relaxation series for His-tagged Tat1−72 at pH 4.1 and 293 K. Spectra recorded on Varian INOVA 600 MHz spectrometer. 168

107 108 109 110 111 112 113 114 115 116

118 119 120 121 122 123 124 125 126 127 128 129 130 8.8

8.6

8.4

8.2

8.0

7.8

HN (ppm) (b)

Figure 4.12: continued

169

7.6

N (ppm)

117

1

Amplitude

0.8

0.6

0.4

0.2

0 0

500

1000

1500

2000

2500

3000

3500

4000

Relaxation Time (ms) (a) T1 = 658 ± 1 ms

Figure 4.13: Sample fits for T1 of Gly-68 measured at (a) 14.1 T and (b) 18.8 T field strengths.

170

1

Amplitude

0.8

0.6

0.4

0.2

0 0

500

1000

1500

2000

2500

3000

Relaxation Time (ms) (b) T1 = 652 ± 2 ms

Figure 4.13: continued

171

3500

4000

1

Amplitude

0.8

0.6

0.4

0.2

0 0

50

100

150

200

250

Relaxation Time (ms) (a) T1ρ = 392 ± 1 ms

Figure 4.14: Sample fits for T1ρ of Gly-68 measured at (a) 14.1 T and (b) 18.8 T field strengths.

172

1

Amplitude

0.8

0.6

0.4

0.2

0 0

50

100

150

Relaxation Time (ms) (b) T1ρ = 363 ± 1 ms

Figure 4.14: continued

173

200

250

The rotating frame longitudinal relaxation rates (R1ρ ) measured for Tat at 14.1 T and 18.8 T fields are plotted in Figure 4.11(c). The R1ρ rates range from 1.5 s−1 (1.3 s−1 ) to 5.9 s−1 (7.2 s−1 ) with mean values of 3.26 s−1 (3.29 s−1 ) at 14.1 T (at 18.8 T) field strength. The R2 rates measured at 14.1 T (Fig. 4.15), range from 1.6 s−1 to 7.1 s−1 with an average value of 3.5 s−1 . The differences between the R2 and the R1ρ rates presumably arise from contributions to the former from slow conformational exchange. An example of the T2 relaxation series is shown in Figure 4.16 and the exponential fit of the data for Gly-68 is shown in Figure 4.17.

10

R2 (s-1)

8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue Figure 4.15: Transverse relaxation rates (R2 ) for His-tagged Tat1−72 at pH 4.1 and 293 K determined at 14.1 T field strength.

174

107 108 109 110 111 112 113 114 115 116

118 119

N (ppm)

117

120 121 122 123 124 125 126 127 128 129 130 8.8

8.6

8.4

8.2

8.0

HN (ppm)

7.8

7.6

7.4

(a)

Figure 4.16: Sample spectrum for T2 (50 ms relaxation time) relaxation series for His-tagged Tat1−72 at pH 4.1 and 293 K. Spectrum recorded on a Varian INOVA 600 MHz spectrometer (14.1 T field). 175

1

Amplitude

0.8

0.6

0.4

0.2

0 0

50

100

150

200

Relaxation Time (ms) (a) T2 = 323 ± 6 ms

Figure 4.17: Sample fit for T2 of Gly-68 measured at 600 MHz.

176

250

In general, low R1ρ and R2 measurements indicate unrestricted fast dynamics whereas high values suggest restricted fast dynamics and possible contributions from slow conformational exchange [250]. In folded proteins of similar length to Tat the R2 values in the absence of exchange are on the order of 8 s−1 [249]. The low R1ρ and R2 values and the negative NOE values measured for Tat indicate large amplitude fluctuations on the ns-ps timescale characteristic of a random coil-like conformation. The R1ρ relaxation data obtained at the 14.1 T field were fit to equation (3.17) in which the influence of neighbouring residues is modelled as a decaying exponential [211]. The flattened, bell-shaped solid curve (Fig. 4.11(d)) shows the behaviour predicted for a randomcoil polymer of uniform composition. The dashed line shows the variation in relaxation when residue contributions are weighted by residue volume [218]. Although a number of individual residues deviate from the volume-weighted model, overall the theoretical line follows the data fairly closely and is a much better fit than the uniform polymer model. The minima in the model correspond mainly to the small flexible residues Gly, Ala, and Ser whereas the maxima are found at the positions of Trp, Arg, and Lys. In contrast to some other applications of this model to denatured proteins [211, 252, 253] there are no obvious regions with large positive deviations from the theoretical curve, further evidence that reduced Tat1−72 at pH 4.1 is predominantly disordered. The segment from Pro-23–Pro-38 contains five prolines and the measured R1ρ values for most residues in this region are greater than the calculated ones. This suggests that the prolines restrict dynamics on the ms-µs timescale and stiffen the backbone in this region. In the C-terminus, from residue 60 onwards, the measured values generally fall below the calculated ones suggesting greater flexibility in this region of the molecule. One exception is the high value for Gly-64, suggesting restricted motion and slow exchange at this position [253]. One final observation is that the region of the protein spanning residues 45-60 (Cys-rich region and core) contains the fewest number of dynamics measurements. This is because the peak intensities in this region are low. The 177

largest number of assigned minor peaks are also found in this segment (see Table A.2 in Appendix A) supporting the suggestion that some residues in this segment undergo slow conformational exchange and are the most likely sites of folding nuclei.

4.6

Spectral Density Mapping

The relaxation measurements carried out at two field strengths allowed the mapping of the spectral density functions at five frequencies: 0, 61, 81, 522 and 696 MHz where the latter two frequencies are 0.87 times the 1 H Larmor frequencies for the 14.1 T and 18.8 T magnetic fields. The spectral density functions at these frequencies are plotted in Figure 4.18. The high-frequency values make a small, relatively uniform contribution to the relaxation across the sequence except at the N-terminus where a significant increase in high frequency motions is observed for the first 10 residues at 522 MHz (Fig. 4.18(a)).

178

J(0.87!H) (ns/rad)

0.1 0.08 0.06 0.04 0.02 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

Figure 4.18: Reduced spectral density mapping of motions for His-tagged Tat1−72 at pH 4.1 and 293 K at frequencies: (a) 522 MHz (0) and 696 MHz (#), based on estimation of J(0.87ωH ); (b) 61 MHz (0) and 81 MHz (#); (c) 0 MHz effective spectral density ($), calculated using the measurements from two fields according to the method of Farrow et al. described in [209]. Average values of the backbone spectral densities are listed in Table 4.1.

179

J(!N) (ns/rad)

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Jeff(0)F (ns/rad)

2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

Figure 4.18: continued

180

Table 4.1: Means with standard deviations and maximum and minimum values of the reduced spectral density mapping for backbone amides of His-tagged Tat1−72 at pH 4.1 and 293 K at: 0, 61, 81, 522, and 696 MHz. Averages correspond to the 58 residues that are common to relaxation measurements at both 600 and 800 MHz. µ±s.d. (ns/rad)

J(ω)

1.05±0.48

2.1

0.44

0.71±0.30

1.59

0.28

J(61)

0.27±0.04

0.35

0.12

J(81)

0.22±0.03

0.30

0.13

J(522)

0.031±0.008

0.068

0.017

J(696)

0.023±0.006

0.049

0.011

Jef f (0)F Jef f (0)

a

b

a

Max. (ns/rad) Min. (ns/rad)

b

calculated using two fields by the method of Farrow et al. [90]. mean residue Jef f (0) from averaging the 600 and 800 MHz solutions for Jef f (0) in the reduced spectral density mapping in equations (3.12)–(3.14).

181

The spectral density profiles at 61 MHz and 81 MHz are highly similar and do show some variation with sequence (Fig. 4.18(b)). The ends of the protein, residues Lys-61 to Leu-63 and Thr-43 exhibit the smallest contributions at mid-frequencies. Interestingly, in the acid-denatured state of apomyoglobin, maxima in buried surface area correlate weakly with maxima in the J(ωN ) plot [254] suggesting that J(ωN ) is sensitive to formation of transient folding nuclei. The smaller values J(ωN ) in Tat near Lys-61 and at the termini suggest less restricted motion in these regions. The low frequency spectral densities cover a wider range of values but also contain the highest levels of error in comparison to the values calculated at high frequencies (Fig. 4.18(c)). The most notable feature is a local maximum in slow motions centred at the 6×His affinity tag. There are also less well-defined peaks in the proline-rich region (residues 21-41) and in the basic region (residues 68-77). The Cys-rich (residues 42-57) segment contains the fewest measurements and they are associated with some of the largest errors. These errors arise from the weak peak intensities in this region of the protein, possibly indicating the presence of conformational exchange in this region. In order to estimate the contribution to relaxation from conformational exchange, Jef f (0) (Fig. 4.18(c)) and Rex were calculated using equations (3.15) and (3.16) from Farrow et al. [90, 209] using relaxation data measured at 14.1 T and 18.8 T field strengths. The conformational exchange rates (Rex ) determined by using this method are plotted in Figure 4.19. These conformational exchange rates are field dependent and were calculated relative to the lower field of 14.1 T (νH = 600 MHz). The mean exchange rate for all residues observed is 1 s−1 with a maximum value of 3.3 s−1 ; 49 of the 58 values measured are within one standard deviation of the mean. Thus, for most residues the contributions to R1ρ relaxation from conformational exchange is minor. Moreover, small Rex cannot be measured accurately using this approach when the data are measured at two similar field strengths (see Ch. 5) [90].

182

4 3.5

Rex (s-1)

3 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue Figure 4.19: Field dependent conformational exchange rates (relative to 14.1 T field) for His-tagged Tat1−72 at pH 4.1 and 293 K determined from the method of Farrow et al. [209] using equation (3.16).

183

Unlike the results reported in [90] for the 59 residue N-terminal SH3 domain from the adapter protein drk (drkN SH3), where many of the calculated conformational exchange rates are negative (contrary to theory), only one residue for Tat was found to have a negative conformational exchange rate (His-5 of the affinity tag). The explanation for the discrepancy from theory in [90] was that the difference between the two fields strengths used to measure their T2 times was only a factor of 1.2 (using data measured at 11.7 T and 14.1 T) and the resulting difference between T2 times was relatively small. The authors note that the calculation of Jef f (0) using two fields with equation (3.15), resulted in a four-fold increase in the errors compared to the errors of Jef f (0) determined from single field calculations. A similar increase in the errors of Jef f (0) determined from the two-field equation (3.15) is found with the single field measurements in Figures 4.20(b) and 4.20(c). Similar to the observations of drkN SH3 in [90,209], the increased error observed for Jef f (0) of Tat is likely the result of error propagation in the calculation as well as similarity in the field strengths used for the measurements (they differ by a factor of 1.333). Consequently, the calculation of Rex using the Jef f (0) values from equation (3.15) should be judged cautiously. The low frequency spectral density values for His-tagged Tat1−72 determined by equation (3.15) are plotted in Figure 4.20 along with the values determined from the single field solutions to equations (3.8)-(3.10) and the mean Jef f (0) values from the two single field measurements. In light of the above discussion on the similarity of the relaxation data when the field strengths are similar, the similarity in the range of values for the R1ρ and R2 data measured at 600 MHz (Figs. 4.11(d) and 4.15) imply that contributions from slow conformational exchange on the ms-µs timescale are not significant for the Tat protein under these conditions. Therefore, the low frequency spectral density values, Jef f (0): (i) do not differ significantly between the calculations from the two separate fields as a result of the small to negligible contribution of Rex and the small difference in the field strengths; (ii) Jef f (0) is likely a very close approximation to the actual J(0) values. 184

Jeff(0)F (ns/rad)

2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

Jeff(0)600 (ns/rad)

1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Figure 4.20: Jef f (0) spectral density maps determined for His-tagged Tat1−72 at 14.1 T (νH = 600 MHz) and 18.8 T (νH = 800 MHz) field strengths calculated for each field separately and using combined data: (a) Jef f (0)F calculated by the method of Farrow et al. [209] ($); (b) Jef f (0)600 calculated from 14.1 T field strength data (0); (c) Jef f (0)800 calculated from 18.8 T field strength data (#); (d) mean value, Jef f (0), calculated using data from both 14.1 T and 18.8 T field strengths (%).

185

Jeff(0)800 (ns/rad)

1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

Jeff(0) (ns/rad)

1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (d)

Figure 4.20: continued

186

4.7

Model-Free Analysis

To obtain further insight into the dynamics of Tat, the relaxation data were fit using LipariSzabo and Cole-Cole model-free methods. The results from the analysis of R-factors (Rf ) and AIC values for the eight tested models (listed in Table 3.4) are shown in Table 4.2. Only Models 7 and 8, which use the Cole-Cole distribution model, result in Rf values of less than 0.1. As the Cole-Cole analysis [223, 224] is not widely used, it may be helpful to describe the Lipari-Szabo results first despite the fact that their Rf values are higher than those for the two Cole-Cole distribution models. The Monte Carlo error estimates were determined for Lipari-Szabo model 2 using the data from the 600 MHz and 800 MHz measurements independently and combined (2nd , 4th and 6th entries in Table 3.4). Model 3, which includes estimates of the conformational exchange rate, can only be used when the two data sets are combined (since only three data sets were obtained at each field).

187

Table 4.2: Individual and total R-factors (Rf ) along with mean Akaike information criterion (AIC) values for the Lipari-Szabo (Models 1 and 2), Lipari-Szabo extended (Models 4-6) and Cole-Cole distribution (Models 7 and 8) methods listed in Table 3.4. Model

Rf [R1 ]

Rf [R1ρ ] Rf [NOE]

1

0.088

0.499

0.400

2

0.065

0.298

1

0.031

2

Rf

Mean[AIC]

SD[AIC]

0.454 600 MHz

4883.59

3915.86

0.109

0.262 600 MHz

1336.04

2998.99

0.257

0.545

0.268

800 MHz

1790.39

2000.84

0.025

0.121

0.142

0.115

800 MHz

184.097

759.199

1

0.110

0.507

0.211

0.453 600 MHz

800 MHz

9795.57

8310.55

2

0.098

0.249

0.154

0.227 600 MHz

800 MHz

3959.18

6922.11

3

0.051

0.146

0.135

0.136 600 MHz

800 MHz

818.647

847.911

4

0.068

0.194

0.143

0.178 600 MHz

800 MHz

2418.22

6134.28

5

0.244

0.356

0.141

0.329 600 MHz

800 MHz

5305.13

13643.1

6

0.303

0.254

0.143

0.254 600 MHz

800 MHz

21316.9

86664.7

7

0.049

0.099

0.138

0.098 600 MHz

800 MHz

321.115

398.497

8

0.051

0.097

0.131

0.097 600 MHz

800 MHz

319.896

401.147

188

Field 1

Field 2

The results of the Lipari-Szabo model-free analyses of the relaxation measurements are shown in Figure 4.21 (Model 2) and Figure 4.22 (Model 3), and in Appendix B (Model 2). The following observations were made in all analyses: In general, the field dependent Rex contributions (determined relative to the 600 MHz field) are less than 2 s−1 for most residues, the largest value being 3.3 s−1 . From the Lipari-Szabo based models (Models 2 and 3) the average S2 values are 0.58 and 0.50 for calculations with and without Rex , respectively. These results are very similar to analyses of relaxation data in other unfolded proteins where order parameters in the range of 0.4-0.6 were determined [90,223]. The 6×His affinity tag contains the residues with the highest order parameters (Figures 4.21(a) and 4.22(a)) but also contains some of the highest errors for all of the model parameter estimates. The His-tag region also exhibited the highest R2 values (Fig. 4.15), suggesting that accounting for conformational exchange or hydrogen exchange, not detectable by R1ρ , may improve the analysis. The τc values show a slight bell-shaped variation with sequence (Figs. 4.21(b) and 4.22(b)) with the correlation times at the protein termini being smaller than in the centre. Furthermore, the average τc values (1.9 and 3.6 ns, calculated with and without Rex respectively), are barely a factor of 10 greater than the average τe (Figs. 4.21(c) and 4.22(c)) values (0.14 and 0.19 ns) and in some cases the errors in the internal and overall correlation times overlap. This lack of stochastic independence in the correlation functions supports the notion that the HIV-1 Tat1−72 protein exists in a disordered or random coil-like conformation in which there is no clear separation of internal and overall rotational correlation times [247].

189

1.5

S

2

1

0.5

0

-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

25

!c (ns)

20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Figure 4.21: Model-free parameter estimates using Model 2 (Rf = 0.227) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields (600 and 800 MHz) using 58 residues common to both data sets. Residues Ser-4, His-5 and Ser-11 were omitted as outliers of the parameter estimates as they failed to converge to a solution. (a) Generalized order parameters S2 ; (b) local overall rotational correlation times τc (ns); (c) internal correlation times τe (ps). The sequence mean values of the estimates are indicated by the solid lines. 190

800 700

!e (ps)

600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

Figure 4.21: continued

191

1.5

S

2

1

0.5

0

-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

10

!c (ns)

8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Figure 4.22: Model-free parameter estimates using Model 3 (Rf = 0.136) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields (14.1 T and 18.8 T) using 58 residues common to both data sets.

No

residues were omitted as outliers. (a) Generalized order parameters S2 ; (b) local rotational correlation times τc (ns); (c) internal rotational correlation times τe (ps); (d) field independent conformational exchange parameters Φex (s/rad2 ). estimates are indicated by the solid lines. 192

The sequence mean values of the

800 700

!e (ps)

600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

!ex (10-17 s/rad2)

2.5 2 1.5 1 0.5 0 -0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (d)

Figure 4.22: continued

193

The results of the Cole-Cole model-free analyses using Model 7 (Table 3.4) are shown in Figure 4.23. Compared to the Lipari-Szabo analyses there is a slight increase in the average order parameter to 0.63 as a result of the very high S2 values in the affinity tag. In general, there are fewer residues with high S2 values across the sequence (Fig. 4.23(a)), but several residues at the N-terminal end of the protein in the His-tag region have values of S2 =1. It is notable that these residues also have the highest errors in all of the estimated parameters. The mean local rotational correlation time (τ0 ) shows a significant decrease in its range (0.26-4.09 ns) as seen in Figure 4.23(b) compared to the Lipari-Szabo models (0.39-21.80 ns for model 2 and 0.28-9.07 ns for model 3), with an average value of 1.21 ns. The rotational correlation time for internal motions (τe ) is also lower in its range (0-384 ps) (Fig. 4.23(c)), and mean value (0.1 ns). Here again, the two correlation times are barely a factor of 10 different which further demonstrates the difficulty of separating the two modes of motion within a disordered protein. Thus, the use of a distribution of local overall correlation times does not appear to have greatly improved the model because most of the distribution width parameter estimates (ε) yield narrow distributions.

194

1.5

2

1

S

0.5

0

-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

5

!0 (ns)

4 3 2 1 0 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Figure 4.23: Model-free parameter estimates using Model 7 (Rf = 0.098) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at two fields (14.1 T and 18.8 T) using 58 residues common to both data sets. Residue His-8 was omitted as an outlier of the parameter estimates as it failed to converge to a solution. (a) Generalized order parameters S2 ; (b) distribution mean local rotational correlation times τ0 (ns); (c) internal rotational correlation times τe (ps); (d) Cole-Cole distribution width parameters ε. The sequence mean values of the estimates are indicated by the solid lines. 195

800 700

!e (ps)

600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

1 0.8

!

0.6 0.4 0.2 0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (d)

Figure 4.23: continued

196

4.8

pH Effects

The NMR samples used for the sequential assignment and relaxation analyses of Tat were stable for over 1 year at pH 4.1. Figure 4.24 shows the effects of increasing pH on the 1

H/15 N-HSQC spectrum of Tat1−72 . The most obvious result is an overall reduction of

cross-peak intensities. Fast hydrogen exchange with water might account for this because exchange rates greater than 103 s−1 will result in the loss of signal intensity from chemical exchange line broadening. To determine if the observed losses are a result of the increasing rates of hydrogen exchange, the theoretical hydrogen exchange rates for unfolded Tat at various pH values were calculated taking into account the nearest neighbour inductive and steric effects [230] (Fig. 4.25). The calculated hydrogen exchange rates for His-tagged Tat between pH 3.3 and 5.8 are predicted to increase on the order of 250-fold. At pH 5.8, the calculated hydrogen exchange rates range from 0.0015 s−1 for Gln-92 to 63.0 s−1 for Gly-2 and it is likely that for many of the peaks the intensity loss is attributable to rapid hydrogen exchange. For example, the histidines in the affinity tag and the Cys residues are predicted to be the fastest exchanging amides and their cross-peaks diminish early in the pH titration. In these cases, the loss of cross-peaks from the spectra is indirect evidence that a residue is not involved in a stable, folded conformation. However, detailed analysis of the peaks shows that hydrogen exchange alone cannot explain all the peak heights. For example, Thr-40 is predicted to exchange slowly, yet its cross-peak disappears early in the titration (Fig. 4.24). These results suggest that some cross-peaks may lose intensity because of the development of local conformations that are in intermediate exchange on the µs-ms timescale as observed in the molten globules of other proteins [255].

197

109

G64 G35 G81

110

G68

G18

111

G13

112 113 114 T84 S19

S36

S3

115

T40

S82

116 117 118 119

N(ppm)

T43

120 121 122 123 124 125 126 127 128 129 130 131 9.0

8.8

8.6

8.4

8.2 N

8.0

7.8

7.6

H (ppm) Figure 4.24: Two-dimensional 1 H/15 N-HSQC spectra of Tat1−72 at 293 K observed at pH 3.3 (red), pH 4.1 (yellow), pH 5.3 (green), pH 5.8 (blue) and pH 6.7 (violet). All samples are approximately 1 mM and were obtained from a single expression/purification. Each spectrum was collected with 32 transients, 2048×256 complex points, and sweep widths of 10 ppm in F2(1 H) and 24 ppm in F1(15 N).

198

Predicted kex (min-1)

16 14 12 10 8 6 4 2 0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue Predicted kex (min-1)

(a)

80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Figure 4.25: Predicted amide hydrogen exchange rates (kex ) for His-tagged Tat1−72 at 293 K for pH values (a) 3.3, (b) 4.1, (c) 5.3, (d) 5.8, and (e) 6.7 using the method of Bai et al. [230].

199

Predicted kex (min-1)

1200 1000 800 600 400 200 0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue Predicted kex (min-1)

(c)

4000 3500 3000 2500 2000 1500 1000 500 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (d)

Figure 4.25: continued

200

Predicted kex (min-1)

35000 30000 25000 20000 15000 10000 5000 0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (e)

Figure 4.25: continued Several resonances in Figure 4.24 remain resolved with the increase in pH and were examined more closely. These resonances correspond to those of the glycine and serine/threonine region of the HSQC spectra and include Gly-13, Gly-18, Ser-19, Gly-35, Ser-36, Thr-40, Thr-43, Gly-64, Gly-68, Gly-81, Ser-82, and Thr-84. The profiles of the absolute peak heights of each of these residues with increasing pH are given in Figures 4.26 and 4.27. In each profile it is clear that the maximum intensity is achieved at pH 4.1. The observation of the maximum intensity at pH 4.1 is slightly higher than the observed pH for which the hydrogen exchange rate is at a minimum in model compounds. The plot in Figure 4.28 shows the calculated variation in the net charge of His-tagged Tat1−72 with increasing pH. The net charge calculations were determined using the European Molecular Biology Laboratory (EMBL) Isoelectric Point Service (http://www.emblheidelberg.de/cgi/pi-wrapper.pl). The pI (the pH for which the protein is neutral) for this Tat sequence was determined to be 10.43.

201

6e+08

Peak Height

5e+08 4e+08 3e+08 2e+08 1e+08 0

2

3

4

5

6

7

6

7

pH (a) Gly-13

4e+08

Peak Height

3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0 2

3

4

5

pH (b) Gly-18

Figure 4.26: Variation in absolute peak heights with increasing pH for observed glycine residues from 1 H/15 N HSQC spectra measured at 293 K for pH values 3.3, 4.1, 5.3, 5.8, and 6.7. Noise estimates in the spectra varied in the 1 × 105 to 3 × 105 range.

202

Peak Height

3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0

2

3

4

5

6

7

5

6

7

pH (c) Gly-35

Peak Height

3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0 2

3

4

pH (d) Gly-64

Figure 4.26: continued

203

4e+08

Peak Height

3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0

2

3

4

5

6

7

5

6

7

pH

Peak Height

(e) Gly-68

4.5e+08 4e+08 3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0 2

3

4

pH (f) Gly-81

Figure 4.26: continued

204

4e+08

Peak Height

3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0

2

3

4

5

6

7

5

6

7

pH

Peak Height

(a) Ser-19

5e+08 4.5e+08 4e+08 3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0

2

3

4

pH (b) Ser-36

Figure 4.27: Variation in absolute peak heights with increasing pH for selected serine and threonine residues from 1 H/15 N HSQC spectra measured at 293 K at pH values 3.3, 4.1, 5.3, 5.8, and 6.7. Noise estimates in the spectra varied in the 1 × 105 to 3 × 105 range.

205

Peak Height

5e+08 4.5e+08 4e+08 3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0

2

3

4

5

6

7

5

6

7

pH

Peak Height

(c) Ser-82

1.8e+08 1.6e+08 1.4e+08 1.2e+08 1e+08 8e+07 6e+07 4e+07 2e+07 0 2

3

4

pH (d) Thr-40

Figure 4.27: continued

206

7e+08

Peak Height

6e+08 5e+08 4e+08 3e+08 2e+08 1e+08 0

2

3

4

5

6

7

6

7

pH

Peak Height

(e) Thr-43

5e+08 4.5e+08 4e+08 3.5e+08 3e+08 2.5e+08 2e+08 1.5e+08 1e+08 5e+07 0 2

3

4

5

pH (f) Thr-84

Figure 4.27: continued

207

30 25

Net Charge

20 15 10 5 0 -5 -10 -15

0

1

2

3

4

5

6

7

8

9

10 11 12 13 14

pH Figure 4.28: Decrease in calculated net charge with increasing pH for His-tagged Tat1−72 . The net charge was determined using the EMBL Isoelectric Point Service (http://www.emblheidelberg.de/cgi/pi-wrapper.pl). Filled circles correspond to the calculated net charge for the protein sequence at increments of 0.5 pH units. Open circles correspond to the predicted net charge of the protein at pH values used for the HSQC measurements in Fig. 4.24 from a cubic spline interpolation of the EMBL calculations.

4.9

Disorder Predictions

Several programs exist to search protein sequences for regions of disorder (DisEMBL, PONDR, FoldIndex, RONN, IUPred, DISOPRED and DisProt). Four of these programs: PONDR [56–58], RONN [61], DisProt [59, 60] and IUPred [54, 55], were tested with the sequence of His-tagged Tat1−72 to compare their predictions with observations of the protein’s flexibility measured by NMR spectroscopy.

208

The DisProt neural network-based predictions shown in Figure 4.29 ((a) to (c) in order of increasing algorithm complexity) predict that the protein is essentially completely disordered with only a slight suggestion of structure in the vicinity of Lys-49. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence. The prediction scores using the VL3 algorithm (Fig. 4.29(a)) for Lys-48, Lys-49 and Cys-50 are 0.474, 0.482, and 0.491 respectively. The VL3H and VL3E algorithms both predict all residues are disordered with scores > 0.5 , but there is a tendency to order in the region of Lys-48 to Cys-50. The disorder predictions produced by PONDR (Fig. 4.30), RONN (Fig. 4.31) and IUPred (Fig. 4.32) all indicate an ordered region exists in the segment containing the Cys-rich region of the protein although the width of the ordered segment varies among the predictions. The PONDR scores (Fig. 4.30) also indicate a short, ordered segment between Val-87 and Lys-91.

209

DISPROT (VL3) score

1

0.8

0.6

0.4

0.2

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

DISPROT (VL3H) score

1

0.8

0.6

0.4

0.2

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Figure 4.29: DisProt [59, 60] disorder predictions for the His-tagged Tat1−72 amino acid sequence using the algorithms: (a) VL3, (b) VL3H, and (c) VL3E. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence.

210

DISPROT (VL3E) score

1

0.8

0.6

0.4

0.2

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

Figure 4.29: continued 1

PONDR score

0.8

0.6

0.4

0.2

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue

Figure 4.30: PONDR [56–58] disorder predictions for the His-tagged Tat1−72 amino acid sequence. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence.

211

1

RONN score

0.8

0.6

0.4

0.2

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue

Figure 4.31: RONN [61] disorder predictions for the His-tagged Tat1−72 amino acid sequence. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence.

Disorder Tendency

1

0.8

0.6

0.4

0.2

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue

Figure 4.32: IUPred [54, 55] disorder predictions for the His-tagged Tat1−72 amino acid sequence. Prediction scores > 0.5 indicate disorder, while scores < 0.5 indicate order in the sequence.

212

Chapter 5 Discussion 5.1

Protein Expression and Purification

The Tat1−72 gene was cloned into pET28 by G. Henry, to enable protein expression with an N-terminal, thrombin-cleavable, 6×His purification tag. Metal affinity chromatography has been used to rapidly purify many proteins to greater than 95% purity in a single step; rapid purification was considered important for the highly oxidation-prone Tat protein. The ability to elute the protein from the resin at low pH also helped prevent disulphide bond formation. A number of approaches to increasing Tat1−72 expression have been attempted in the past [256, 257]. In this case, the pSV2tat72 vector [186] was chosen to be the source of the tat gene because it is codon-optimized for expression in E. coli. As Tat has well described cytotoxic effects, the pET28 plasmid was chosen for expression because of the stringent control over the lacUV5 promoter owing to the presence of the lac repressor (lacI) [258]; the plasmid is also less easily lost from the cell because of the kanamycin resistance gene. The choice of a pLysS-containing host was also made with the toxicity of Tat in mind but has

213

the added advantage of more facile cell lysis by the endogenous T7 lysozyme. Several reducing agents (β-mercaptoethanol, dithiothreitol, tris(2-carboxyethyl)phosphine, tris(2-cyanoethyl)phosphine, tris(hydroxypropyl)phosphine, and sodium sulfite) were tested for the production of reduced, monomeric Tat1−72 . Tris(2-carboxyethyl) phosphine (TCEP), in the presence of 6 M guanidine, was found to be the most effective. TCEP is both a stronger reducing agent and effective over a wider pH range (1.5-8.8) than thiol reducing agents such as β-mercaptoethanol (βME) and dithiothreitol (DTT) [259]. Its use permitted the entire purification—from cell lysis at neutral pH to elution from the cobalt resin at pH 4—to be done in a strong reducing environment. This is not possible with thiol reducing agents as neither DTT (pKa = 9.2, 10.1) nor βME (pKa = 9.6) are effective reducing agents at low pH [260]. According to the manufacturers (Clonetech and QIAGEN), DTT is not compatible with metal affinity resins but we observed no incompatibility between TCEP and immobilized Co2+ and Ni2+ . βME can be used up to concentrations of 20 mM in metal affinity chromatography (according to the manual from QIAGEN) but its effectiveness as a reducing agent is much less than that of DTT or TCEP, especially at lower pH values. One problem with TCEP is that it has 3 negative charges at neutral pH and precipitates the highly basic Tat protein. To overcome this problem, 6 M guanidine was included in the extraction buffer; with subsequent removal of guanidine and TCEP together at low pH by dialysis. Tris(2-cyanoethyl)phosphine (TCP) is also a strong reducing agent and has added advantages: it is neutral, it does not precipitate Tat, and it can access less solvent exposed sulfhydryls. Unfortunately, TCP is less soluble and significantly less stable than TCEP, oxidizing more readily in air. Tris(hydroxypropyl)phosphine (THP) was also investigated as a reducing agent as it is miscible with water and neutral. However, THP is a viscous liquid and very reactive. To deal with the short lifetime of the reducing agent, attempts were made to degas and seal protein samples containing THP in the NMR tubes. However, it was found that the viscosity of THP-aqueous Tat mixtures significantly increased the protein 214

NMR line-widths. Another reducing agent that was tested was sodium sulphite. A small amount was added to the NMR sample at low pH and has the advantage of being invisible to 1 H NMR spectroscopy but is only a mild reducing agent in comparison to TCEP. The use of 6 M guanidine throughout the purification has several advantages over alternatives such as 8 M urea with or without 1-2 M NaCl. In comparison to urea, guanidine solutions consistently yielded the highest amounts of soluble Tat in the initial cell lysates. Presumably, high concentrations of guanidine encourage dissociation of Tat from DNA, RNA, and other anionic molecules. Furthermore, removal of the guanidine during the washing steps of the protein while bound to the metal affinity resin resulted in very slow elution of the protein from the resin suggesting, that the Arg-rich basic domain of the protein can interact with the nitrilotriacetic acid groups of the Sepharose resin that have lost their coordinated metal ion. In the presence of 6 M guanidine at pH 4, 70% of the Tat protein eluted from the immobilized metal resin in the first two 1 mL fractions (based on UV absorbance measurements with spectroscopic grade guanidine hydrogen chloride) whereas in the absence of guanidine the same fraction of protein eluted in approximately 40 mL. Purification with denaturant used throughout each step of the purification process followed by subsequent removal during dialysis was found to be the most efficient way to obtain large quantities of reduced, monomeric protein. Removal of the denaturant and reducing agent by dialysis necessitated the use of low pH buffers to maintain the cysteine thiol groups in their unreactive protonated state. Degassing the dialysis and NMR buffers as well as maintaining an argon atmosphere further reduces the possibility of oxidation of the protein. TCEP was not used as a reducing agent in the NMR samples because it apparently precipitates Tat at neutral pH. Instead, rigorous degassing of the sample buffer and addition of a mild reducing agent (sodium sulphite) allowed preparation of NMR samples that were stable for more than 6 months.

215

5.2

NMR Spectroscopy and Backbone Assignment

The spectral dispersion of the resonances shown in Figure 4.3 is typical for disordered proteins. The clustering of resonances into three regions is also expected for disordered proteins. The narrowness of the dispersion range in both 1 H and 15 N dimensions is indicative of disorder and is virtually identical to the dispersion of resonances for strongly denatured ubiquitin [236]. The resonance line-widths are broad relative to those of the native state of ubiquitin (MW =8565 Da, τc =4.1 ns) with 1 H and respectively [169]. The 1 H and

15

15

N linewidths of 6-9 Hz and 3 Hz

N linewidths of Tat have mean values 15±5 Hz and 6±1

Hz suggesting possible conformational exchange on the intermediate NMR time scale (µs-ms range). Line broadening may also result from hydrogen exchange with the water solvent. However, as these Tat samples are at low pH (∼4) where the rate of hydrogen exchange is near its minimum [261, 262], it is not likely that exchange with the solvent is the major cause of the line broadening. Hence, the broad lines observed for Tat are most likely the result of conformational exchange in the µs-ms range as one would expect for a protein that lacks regular secondary structural elements. The intensity profile depicted in Figure 4.5 shows that the weakest resonances lie within the Cys-rich (residues 42-57) and core (residues 58-67) regions of the protein. Several residues (see Tables A.1 and A.2 in Appendix A) are observed to have multiple resonances in slow exchange. Many of these additional resonances are associated with residues in the Cys-rich and core regions of Tat. The decreased intensity of resonances within these regions are the result of the intensity being split between multiple signals in slow-exchange or by line broadening of the residues in intermediate exchange. The conformational exchange on the µs-ms timescale in these regions may indicate regions of transient structure formation, which may only become stabilized in the presence of zinc ions, binding to TAR, cyclin T1, or other binding partners. The absence of additional unassigned peaks in the NMR spectrum of the unlabelled 216

Tat (Fig. 4.3(b)) indicates a high level of purity since other unlabelled proteins in the sample would produce signals through their 15 N natural abundance and confirms the MALDITOF-MS analysis. The backbone assignment described in [166] resulted in unambiguous assignment of 80 of the 83 observable (non-proline and non-N-terminal) amide resonances. The assignments of the Cys residues are particularly informative as they confirm that all of the Cys residues are reduced; all of the Cα and Cβ chemical shifts (shown in Fig. 4.7) observed in the 3-dimensional HNCACB [194] spectrum are in the range of the random coil chemical shifts of reduced cysteine rather than oxidized cysteine involved in disulfide bond formation. The chemical shift resonances for the Cys residues also confirm the findings from the MALDI-TOF-MS analysis that the protein is in the reduced monomeric state and that the weak peaks in Figure 4.2 most likely indicate the presence of non-covalent oligomers formed during the MS analysis. Due to the narrow dispersion and overlap of signals in the proton dimension, many experiments were required for the sequential assignments of the protein backbone. All of the experiments used (listed in Table 3.3) required 3-dimensional HN detection for the assignment of protein backbone resonances. These experiments utilize the increased dispersion in the 15 N and 13 C dimensions to resolve the clustered regions of the 1 H/15 N-HSQC spectra. However, in order to resolve some of the clustered resonances, the experiments required higher resolution than used for folded proteins of comparable size. The sequential assignment of the backbone was further complicated by the presence of multiple slowexchange resonances which needed to be assigned as well.

5.3

Chemical Shifts and Coupling Constants

The use of the chemical shift as an indicator of secondary structure has been applied to the analysis of proteins under denaturing conditions to identify regions of residual structure in 217

the protein [254, 263–265]. In some cases, non-native residual structure is identified that may indicate transient intermediates along the protein folding pathway [254, 263]. In the case of intrinsically disordered proteins, which lack identifiable structural elements at high resolution, the chemical shift provides a means of identifying residual structure that may be characterized as a helix, loop or sheet. The CSI procedure introduced by Wishart and Sykes [241] has been applied to identify the secondary structural elements in folded proteins. However, for the study of unfolded, denatured, or disordered proteins, variations in the procedure have been adopted to account for local sequence effects [203, 204, 252] on the reference shifts for the random coil. Consistent deviations over three or four residues from the sequence-corrected random coil values may serve as indicators of residual structure in these proteins. In the absence of the sequence correction to the random coil chemical shifts, small chemical shift variations along a protein sequence may be too subtle to observe conformational preferences along the backbone. For His-tagged Tat1−72 (Fig. 4.8), the majority of the chemical shift differences from the sequence-corrected random coil shifts [203, 204] lie within the bounds of the random coil conformation. Because no sequence-dependent corrections were available for the Cβ random coil chemical shifts, these differences are less informative than the other difference plots. The difference plots for the HN , Cα and Hα chemical shifts, as well as the 3 JH N H α coupling constants, show a slight preference for helical conformation in the vicinity of Glu-29 but there is no uninterrupted segment of 3 or more residues defining a helical domain. The difference plot for the C’ shifts (Fig. 4.8(c)) indicates a slight preference for the β conformation in the Cys-rich region (residues 42-57) although this cannot be confirmed from any of the other difference plots. The 3 JH N H α coupling constants do not verify this observation as many of the Hα shifts in this region were absent in the HNHA experiment used to measure the coupling constants. The region with a weak preference for helical conformations near Glu-29 is very close 218

to the single Trp residue at position 31 (noted in Fig. 4.9). Studies of the unfolded state of drkN SH3 domain observed non-native burial of the Trp indole ring and at the centre of a hydrophobic cluster [169, 191]. The Trp indole H-N resonance in the HSQC spectrum (Fig. 4.4 inset) shows multiple signals, one strong and two weak cross-peaks. These crosspeaks may be the result of cis-trans proline isomerization from the preceding Pro residue or perhaps arise from slow conformational exchange between the open and buried Trp state. Isomerization at Pro-30 would more likely affect its preceding residue, Glu-29 (noted in Fig. 4.9), and isomerization at Pro-34 is unlikely to influence chemical shift deviations three residues away at Trp-31. It is possible that interactions between the indole ring of Trp-31 with the charged imidazole ring of His-33 account for some restriction in the flexibility and weak helical conformational preference in that region. The presence of proline at positions 30 and 34 may also prevent stabilization of this segment as a helix [266–268]. The region that shows a slight tendency toward the β conformation is in the region of residues 44-60 which includes most of the Cys-rich region (residues 42-57) of the protein. Although the chemical shift difference plot for the C’ shifts does not reveal any segment of 3 or more residues outside the random coil range, all of the residues in this region have values closer to β-sheet than to α-helix. This Cys-rich region can also be seen in Figure 4.9 as the yellow region deviating from the simple extended structure.

5.4

NMR Relaxation

Despite the relatively poor chemical shift dispersion in the NMR spectra of Tat, relaxation data were obtained for 77% and 72% of the observable resonances (non-proline and nonN-terminal) at the 600 MHz and 800 MHz spectrometer frequencies respectively.

An

unfortunate consequence of reduced spectrometer time at the higher frequency was the loss of resolution preventing identification of some closely clustered resonances. 219

The lack of any significant variation in the steady-state heteronuclear NOEs (Fig. 4.11(a)), coupled with their negative values, are good indicators of the degree of uniform disorder (or less restricted dynamics) throughout the protein backbone at the ns-ps timescale. The mean values for the NOE at 600 and 800 MHz are -1.27±0.46 and -0.93±0.33 respectively. These values are consistent with NOEs obtained by Farrow et al. [90] for the guanidine-denatured state of the 59 residue drkN SH3 domain which had mean NOEs of -1.41±0.37 (at 500 MHz) and -1.20±0.31 (at 600 MHz). The larger negative NOE values for the unfolded state of drkN SH3 indicate less restricted motions at the ns-ps timescale in contrast to observations on the folded state of drkN SH3 where the mean NOEs were found to be -0.39±0.09 (at 500 MHz) and -0.36±0.19 (at 600 MHz). The less negative mean values as well as the more narrow variation in the values across the sequence indicates a greater degree of restricted motions in the folded state of drkN SH3. The longitudinal relaxation rates (R1 ) observed for Tat (Fig. 4.11(b)) have mean values of 1.5±0.2 and 1.4±0.2 s−1 at 600 and 800 MHz respectively. These R1 values for Tat are similar to those observed for drkN SH3 denatured state (1.6±0.2 s−1 at 500 MHz and 1.5±0.2 s−1 at 600 MHz). The folded state of drkN SH3 has mean R1 values of 2.5±0.3 s−1 (500 MHz) and 2.2±0.2 s−1 (600 MHz). The slightly lower R1 values for the unfolded state of drkN SH3 and those observed for Tat compared to the folded state of drkN SH3 imply slower relaxation and hence shorter rotational correlation times (τc ) and faster dynamics at the ns-ps timescale. The rotating frame longitudinal relaxation rates (R1ρ ) measured at both 600 and 800 MHz (Fig. 4.11(c)) have means of 3.3±1.1 and 3.5±1.4 s−1 respectively. These rates show more variation across the sequence than the R1 values, and have local maxima in the neighbourhood of the end of the Cys-rich region (Cys-57 to Ile-59), in the middle of the Pro-rich region (near the residues adjacent to Pro-30), and near the hexahistidine segment of the affinity tag (His-5 to His-10). These same regions are found to have higher values 220

of transverse relaxation rates (R2 ) measured at 600 MHz (Fig. 4.15). The increased rates in these regions, relative to the neighbouring residues, are likely indicative of contributions from slow conformational exchange. The mean transverse relaxation rates (R2 ) from the 600 MHz field measurements were found to be 3.8±1.3 s−1 . The increase in the average R2 rate over that of the R1ρ is likely the result of the increased sensitivity of the R2 experiment to conformational exchange. Both the R1ρ and R2 means are slightly higher than the values observed with the urea denatured state of drkN SH3 (approximately 3.0±0.9 s−1 at both 500 and 600 MHz) [90]. The folded state of drkN SH3 has mean transverse rates of approximately 6.0±0.8 s−1 . Again the slower rates observed for Tat are indicative of faster dynamics at the ms-µs timescale. It is worth noting that the errors in both R2 and R1ρ measurements for Tat are roughly a factor of 10 and 4, respectively, larger than the errors observed for the R1 measurements. There are several reasons for this increase in the error. In general, the transverse rates are faster than the longitudinal relaxation rates. Consequently, the signal for the resonance will decay faster and reduce the signal-to-noise of the peak thereby making measurement of the intensity higher in error. Additionally, the pulse sequence for both the R2 [178, 201] and R1ρ [202] experiments contain many pulses on the

15

N channel for the CPMG (in R2 ) and

spin-lock (in R1ρ ) that introduce errors due to magnetic field inhomogeneities. These same pulses on the

15

N channel may also introduce coil/sample heating which will affect both

the position and intensity of the resonance resulting from the rotational correlation time decreasing as the viscosity of the solvent decreases with increasing temperature [184] as well as deterioration of the lock signal. The heating of the sample and coil as the relaxation delay increases limits the range of relaxation delays available for both the R2 and R1ρ experiments (relaxation delays should be & 250 ms). This limit on the maximum relaxation delay results in a much smaller range of sampling times compared to the R1 experiments which samples relaxation delays between 0 and 4 seconds and the decay to zero is observed. 221

This is illustrated in the sample fits for the T1 (Fig. 4.13), T1ρ (Fig. 4.14) and T2 (Fig. 4.17) data of Gly-68 that show the narrow range of data sampled in the T1ρ and T2 measurements compared to the T1 measurements (since the decay to zero is not observed).

5.5

Reduced Spectral Density Mapping

The original implementation of spectral density mapping by Peng and Wagner [181] used an expanded set of six relaxation experiments in order to evaluate the spectral density function at the five critical frequencies for a single field strength: J(0), J(ωN ), J(ωH − ωN ), J(ωH ), and J(ωH + ωN ) as well as the contribution from conformational exchange on the µs-ms timescale. The reduced spectral density method [90, 181, 213] avoids the collection of six relaxation rates by replacing J(ωH ) and J(ωH ± ωN ) with a single high frequency spectral density function [181] and combining the contribution from conformational exchange into an effective low frequency spectral density, Jef f (0) defined in equation (3.6). hif req Jred = J(ωH + ωN ) = J(ωH − ωN ) = J(ωH )

(5.1)

The above assumption is deemed reasonable because J(ω) is relatively flat at the frequencies ωH and ωH ± ωN , and because the heteronuclear cross-relaxation rates become smaller with increasing field strength [181]—the latter observation is true for Tat but the former has not been verified. The reduced spectral density approach thus allows for the mapping of the spectral densities at three frequencies, Jef f (0), J(ωN ), and J(0.87ωH ), using only three relaxation data sets which for convenience are chosen to be the R1 , R1ρ and the steady-state heteronuclear NOE since they are also used for the model-free formalism. If relaxation data are collected at two field strengths, then the spectral density is mapped at five frequencies corresponding to the 0 and the Larmor frequencies of the

222

15

N and 1 H spins at each field.

Figure 5.1(a) shows the plots of the longitudinal and transverse relaxation rates and the steady-state heteronuclear NOE determined as a function of the overall rotational correlation time of the molecule from evaluation of the relaxation parameters in equations (2.173), (2.174) (assuming Rex = 0) and (2.186) using the orientational spectral density function in (2.135) for an isotropically tumbling rigid body where τc represents the overall rotational correlation time [269]. The plots were calculated with respect to a 14.1 T magnetic field. Figure 5.1(b) shows the variation in the corresponding spectral density functions with the overall rotational correlation time from evaluation of (2.135) at frequencies 0, 61, and 600 MHz. Both plots in Fig. 5.1 are presented in a logarithmic scale except for the NOE (right y-axis). For a protein the size of His-tagged Tat1−72 (10.5 kDa) the theoretical rotational correlation time according to the Stokes-Einstein-Debye equation [261,270] for spherical body is given by τm =

ηV kT

(5.2)

where η is the viscosity of the sample (1.014 mPa·s), V is the hydrodynamic volume (1.2715× 10−26 m3 ), k is the Boltzmann constant (1.38066 × 10−23 J· K−1 ), and T is the temperature (293 K). The calculated overall rotational correlation time for His-tagged Tat1−72 using equation (5.2) is 3.19 ns.

223

4

0 R2

Log10(Ri)

2

-1

1

-2

0 R1

-1

-3

-2 -3 -5

-4

NOE

-4 12

NOE

3

11

10

9

8

7

6

5

-5

-Log10(!m) (a)

-6

Log10[J(!)]

-7

J(0)

-8 -9 -10 J(!N)

-11 -12

J(!H)

-13 -14

12

11

10

9

8

7

6

5

-Log10("m) (b)

Figure 5.1: (a) Variation in the theoretical relaxation rates and steady-state heteronuclear NOE with overall rotational correlation time from evaluation of equations (2.173), (2.174), and (2.186) assuming Rex = 0 and using the orientational spectral density function defined in equation (2.135) relative to a 14.1 T field; (b) Variation in the orientational spectral density function evaluated at zero, ωN and ωH frequencies relative to a 14.1 T field with the overall rotational correlation time (τm ). 224

The plots of the spectral density function (Fig. 5.1(b)) evaluated at 0 and ωH (61 MHz) are clearly colinear in ps-ns range but begin to diverge near 10 ns.

Given the

estimated correlation time from the Stokes-Einstein-Debye equation of 3.19 ns for Histagged Tat1−72 , the J(0) and J(ωN ) motions will be correlated for the 14.1 T and 18.8 T fields. Conversely, the spectral density function at low- and mid-frequencies will be anticorrelated with the J(ωH ) [271]. The integral of the spectral density function J(ω) is constant over the entire frequency range [184]. For motionally restricted amide bond vectors, the greatest contributions to the spectral density function will come from the low-frequency components and the high frequency contributions will be minor [272]. Highly mobile N-H vectors will have the greatest contribution to the spectral density function from the highfrequency components, J(ωH ) and the low-and mid-frequency contributions will decrease. Consequently, intramolecular motions increase the values of J(ω) at high frequency but decrease its magnitude at low and mid-frequencies for proteins in the small- to medium molecular weight range [271, 272]. With these characteristics in mind, the interpretation of the reduced spectral density mapping allows for the following observations (Sections 5.5.15.5.3) on the dynamics of the His-tagged Tat1−72 protein.

5.5.1

J(0.87ωH )

The high frequency spectral density plots (Fig. 4.18(a)) show very little variation in range over the length of the sequence except in the region of the hexahistidine affinity tag. There is a significant increase in the high-frequency contributions in the N-terminal His-tag (residues 4-10) observed at 522 MHz. The C-terminal region of the protein does not show any significant increase in high-frequency components. This observation is in contrast to the low pH and urea-unfolded state of apomyoglobin in which minima in J(0.87ωH ) plots correspond to maxima in buried surface area in the folded protein [217] suggesting that hydrophobic

225

interactions persist even in 8 M urea and low pH. Interestingly, the minima in the high frequency spectral densities are less apparent in the acid-unfolded state of apomyoglobin [254] but the maxima in the J(ωN ) plot still correlate weakly with the maxima in buried surface area—implying that the spectral density at mid-frequency is more sensitive to residual structure. The lack of definition in the J(522) and J(696) plots for reduced Tat1−72 implies a lack of formation of any residual structure at pH 4.1.

The range of values for the

high frequency spectral density corresponds well to the range observed for the guanidine denatured state of drkN SH3 [90], acid-denatured apomyoglobin [254], low pH/urea denatured apomyoglobin [181], and the natively disordered pro-peptide of subtilisin [223]. By comparison, the folded state of drkN SH3 shows the J(0.87ωH ) values to be roughly half of those observed for the unfolded state [90]—indicating a greater degree in the restriction of the motions probed at these high frequencies.

5.5.2

J(ωN )

More variation is observed in the range of values for J(ωN ) probed at 61 and 81 MHz (Fig. 4.18(b)). The spectral density function at mid-frequencies is sensitive to motions on the ns-ps timescale. In contrast to the high frequency values for small- and mediumsized proteins, the increased motion in the protein backbone is marked by a reduction in the spectral density at mid-frequencies [271]. The J(ωN ) plots (in Fig. 4.18(b)) show that the termini have the lowest contribution to the spectral density at mid-frequencies. These residues are those involved in the slowest relaxation rates and most negative NOEs (Fig. 4.11) and hence are the regions of greatest flexibility. The increased motion at the ends of the protein is common for both folded and unfolded proteins and is often termed end effects [242, 273]. Residues Asn-44, Lys-61 and Ala-62 were also found to have significantly 226

faster motions—on the order of the sequence termini. Schwarzinger et al. [252] in the studies of low pH/urea denatured apomyoglobin noted that high proportions of glycine and alanine were present in the most flexible regions of the protein and suggested that the Gly/Ala rich segments serve as flexible “molecular hinges”. The region of Tat following the Cys-rich region contains one alanine and two glycines (Ala-62, Gly-64 and Gly-68). Interestingly, Gly-64 is one of the residues that is observed as two distinct peaks of approximately equal intensity in the HSQC spectra of Tat (Fig. 4.3 and 4.4), while Ala-62 and Leu-63 each have four distinct peaks of varying intensity (the most intense peak is labelled in Fig. 4.3 and 4.4 and additional resonances are tabulated in Table A.2). At the other end of the Cys-rich region there is only a single alanine residue (Ala-41) within the 10 residues preceding the Cys-rich region. The small side-chains of alanine and glycine residues flanking the Cys-rich region may provide a higher degree of mobility at the ends of the Cys-rich region and thereby allow more rapid sampling of conformational space and facilitate the rapid reorientation of the cysteines in the presence of Zn2+ ions or a binding partner (cyclin T1) involved in transcriptional regulation [105, 147, 274]. The reduced spectral density mapping of the acid-unfolded [254] and to a lesser extent the low pH/urea denatured [252] state of apomyoglobin showed local maxima of J(ωN ) correlated with maxima in the average area buried upon folding. As there is no folded form of the Tat protein to compare buried surface area, it can only be said that three residues exceed one standard deviation of the average value of the J(ωN ) data at both fields: Glu-29, Trp-31 and Lys-32—perhaps indicating the burial of the indole ring of Trp-31 in a hydrophobic cluster.

227

5.5.3

Jef f (0)

The errors in the R1ρ measurements noted previously become quite significant in the calculation of the low frequency spectral density, Jef f (0), using equation (3.15) from the method of Farrow et al. [90] and shown in Figure 4.20(a). The measured values for R1ρ are involved only in the analytical solution of Jef f (0) in the reduced spectral density mapping from equations (3.8)-(3.10). The R1ρ errors propagate in the estimation of Jef f (0) using equation (3.15) such that they result in errors in Jef f (0) that are roughly 4 times the errors in the two estimates of Jef f (0) using the 600 and 800 MHz data sets separately. With this in mind it is more useful to use the mean residue zero frequency spectral density (Jef f (0) in Fig. 4.20(d)) to assess the slow motions of the protein. The overall average in the low frequency spectral density is 0.7±0.3 ns/rad. Residues with significantly larger values of Jef f (0) compared to their neighbouring residues can indicate regions of conformational exchange on the µs-ms timescale. There are eight residues which exceed one standard deviation of the mean in Figure 4.20(d): Asp-25, Leu-28, Glu-29, Trp-31, His-33, Gly-35, Ala-41, and Val-56 (residues Leu-28, Glu-29, Trp-31 actually exceed 2×s.d.). Of these residues, only Glu-29 and Trp-31 agree with the spectral density mapping at high and mid-frequencies in terms of restriction in the dynamics and may perhaps indicate conformational preferences. According to the chemical shift difference analysis (Fig. 4.8), some residues in this region do exceed the limits of the random coil range. However, it should be kept in mind that, even though exceeding 2×s.d. of the mean, all Jef f (0) values still fall in the range of motions observed for unfolded or partially unfolded proteins [90,223,252, 254, 264, 272, 275–277]. The fact that two residues in a three residue segment (Glu-Pro-Trp) have spectral density values consistent with more restricted dynamics does not necessarily imply that they are involved in transient structure formation as the actual values of all the spectral densities are still in ranges comparable to disordered or denatured proteins. However, based on the chemical

228

shift differences, it may not be unreasonable to suggest that there may be some tendency of this segment to exist in α-helical conformations.

5.6

Model-Free Analysis

The use of the model-free formalism is often employed to interpret NMR relaxation parameters in terms of the motions of folded globular proteins. However, in the case of denatured or disordered proteins, the Lipari-Szabo approach [175, 176] is quite limited due to the underlying assumption of the separability of the internal and overall motions. Some success has been made in the modelling of denatured or disordered proteins using variations of the model-free method such as the Cole-Cole [223, 224] and Lorentzian [225] distribution models in which the model-free spectral density is based on a distribution of local overall correlation times. The assumption of the separability of the internal motions from the distributions of overall motions is however still present in these approaches. Interpretation of the relaxation data, on partially or fully disordered proteins, in terms of the Lipari-Szabo formalism [175, 176] (or variations thereof [221, 223–225]) is complicated by several factors including the following [269]: • The disordered state of the protein exists as an ensemble of conformations in fast exchange on the NMR timescale and the measured relaxation parameters therefore represent a population weighted average of the ensemble through chemical shift averaging. • The shape of any one of the conformations in the ensemble could be anisotropic. • The disordered protein is not likely to be fully extended and behave as a rigid rotor and may therefore preclude a description of molecular reorientation in terms of an overall rotational correlation time. 229

• Some disordered proteins (or partially disordered proteins) exist as an ensemble of conformations in intermediate exchange on the NMR timescale diminishing the intensities of the NMR resonances. With regard to the fact that a disordered protein exists as an ensemble of rapidly converting conformations, it can be assumed that conformational averaging should ‘smooth’ static anisotropy resulting in an isotropic average [269, 278]. The generalized order parameter S2 in equation (2.198) from the original Lipari-Szabo formalism [175, 176] describes the amplitude of the motions of the internuclear vector (the 1

H-15 N amide bond vector in this case) on a timescale faster than overall tumbling [279].

The order parameter, according to the assumptions of Lipari and Szabo, is defined such that 0 & S2 & 1 with S2 = 1 representing complete restriction of internal motion and S2 = 0 representing relaxation in which the internuclear vector is completely dominated by internal motions. The 1 − S2 term in equation (2.198) represents the extent of orientational motion that is lost due to the internal motions as opposed to rotational diffusion [279]. The flexibility of a protein backbone is therefore reflected in the magnitude of the order parameter. As flexibility in a protein often changes as a result of binding interactions, changes in the order parameter may provide a useful indicator of regions where binding of a flexible region of the protein to a target (protein binding partner, nucleic acid, etc.) occurs. The generalized order parameter could therefore reflect changes in the dynamics of the Cys-rich region of Tat (residues 42-57) in the presence of Zn2+ and cyclin T1, or the basic region (residues 68-77) in the presence of TAR. Fast internal motions described by the effective internal correlation times (τe ) are associated with small amplitude librations resulting from restriction of the N-H bond vector. Internal motions less than 100 ps are associated with order parameters in the 0.7 to 1.0 range [272, 280]. Regions of consecutive residues in the protein where the internal motions 230

of the N-H bond vector are fast relative to the rest of the segment may indicate transient structure or centres of early folding events. Conversely, slow internal motions are associated with large amplitude fluctuations of the N-H bond vector in regions of increased flexibility (lower order parameters). Extremely slow effective internal correlation times are associated with completely unrestricted motions and here S2 tends toward zero. The simple model-free formalism as originally posed by Lipari and Szabo [175, 176] assumes isotropic molecular tumbling (τc ) on the nanosecond timescale and fast internal motions (τe ) with characteristic correlation times of less than 100 ps [175,176,281]. However, if the internal motions are much faster than the overall tumbling (with τc /τe 1 100 ), then the Lipari-Szabo spectral density function defined in equation (3.18) becomes insensitive to the timescale of internal motions but is still sensitive to the degree of restriction [210]. In such a case, a simplified spectral density equation could be used in which the effective correlation time of internal motions, τe , is assumed to be zero and only S2 and τc are optimized. Conversely, if the overall motions and internal motions occur on similar timescales (τc /τe 2 100), then the motion cannot be considered isotropic in the Lipari-Szabo sense and there is less clear separation between the timescales of motion and they cannot be considered to be uncorrelated. The extended model-free equations, as in equation (3.22), suggested by Clore et al. [221] describe internal motions on two uncorrelated timescales (slow and fast) in which an order of magnitude difference exists between the τs and τf . However, the slow timescale motions often approach the timescale of overall motion and there is not a well defined separation between overall tumbling and internal motion [210]. In the case of the model-free estimation of dynamics parameters for the His-tagged Tat1−72 , the timescales of internal and overall motion are not well separated. As such, it is difficult to make definitive conclusions about the motions of the residues along the chain

231

with any model-free method. The simplest method that best fits the data is Model 7 using the Cole-Cole distribution of local rotational tumbling and no conformational exchange. Addition of a conformational exchange term to the model (as in Model 8) results in only a slight reduction in the Rf values, but increases both the number of parameters estimated and the value of mean[AIC]+sd[AIC] for the overall fit (see Table 4.2). The difference between Models 7 and 8 is very slight in terms of selection criteria, but as the sum of the χ2 values across the protein sequence is less than the sum of the 95% confidence limit critical values, χ2 (0.95), it is reasonable to choose model 7 to represent the data since it is the simpler of the two models. However, it should be noted that, despite the fact that N ( i

χ2i


100 ps leaving barely one order of magnitude separating the timescales of the two types of motion. The low average value of τe stems from the result of τe = 0 for 14 residues. Not surprisingly, each of these 14 residues correspond to failures in the χ2i < χ2 (0.95)i condition and suggest that a different model should be used. Alexandrescu and Shortle [269] observed that failures to meet the χ2i < χ2 (0.95)i in the two-timescale model 233

suggested estimations should be made with a model for a different number of timescales of motion (in that case they increased the number of timescales to use the extended model free approach of Clore et al. [221]). The use of the model-free formalism to study the dynamics of a disordered or partially disordered protein has several pitfalls and is not widely used. Despite some successes in parameter estimation with variations in the approach—like that of the distribution of overall correlation times [223–225]—it is unlikely that there exists a clear separation in the timescales of the internal and overall motions. However, the successes in using the modified modelfree approach in studying the pro-peptide of subtilisin [223, 224] and the partially unfolded domain 2 of annexin I [225] suggested that some attempt should be made to extract dynamic parameters. Unfortunately, none of the models tested provided estimates with uniform significance across the sequence. It may be worthwhile to consider each residue within a disordered protein or a region of disorder in a protein as an independent tumbling body and using the model selection criteria on a per residue basis. It is difficult to justify this idea of each residue being treated as a lone amino acid in solution since there are no studies in the literature referring to such a case. However, if a protein is truly disordered, then there is little reason to assume that one model would represent the entire protein.

5.7

pH Effects

The HSQC spectra for Tat over the pH range from 3.3 to 6.7 (Fig. 4.24) demonstrate a general decrease in the intensities of the cross-peaks. The chemical shifts of the resonances for most residues remain relatively unchanged over the same range of pH values. Exceptions are found for Ser-12, Leu-14, Glu-29, Gly-35, Lys-48, Lys-61 and Lys-91 where small changes in the 1 H and

15

N chemical shifts are observed. The absence of increased dispersion in the

resonances as the pH is raised supports the suggestion that the disordered state of Tat 234

persists at physiological pH. There are several possible explanations that might account for the variation in the intensities of the NMR signals in the HSQC spectra as the pH of the Tat samples increases. One possibility is that the loss of signal intensity may be due to amide hydrogen exchange with the solvent water, which is most pronounced in the affinity tag (residues 1-20) and Cysrich (residues 42-57) regions of the protein where the predicted hydrogen exchange rates are highest (Fig. 4.25). The sensitivity of these regions to hydrogen exchange may result from saturation transfer effects with the water signal leading to resonances within these regions vanishing rapidly as the pH exceeds 4. The susceptibility of resonances from the affinity tag and cysteine-rich region to hydrogen exchange is indirect evidence that these regions are not involved in hydrogen bonding that would be present in stable folded conformations protecting them from hydrogen exchange. The significantly higher rates found in the affinity tag (in particular with the consecutive histidines) is in part responsible for the generally low intensity of affinity tag resonances at or near physiological pH—a general benefit of the tag in NMR spectroscopy. In fact, at pH 6.7 the predicted hydrogen exchange rates for His-6 to His-10 are greater than the amide proton coupling constant (JN H =94 Hz). When kex is much greater than JN H , then the lifetime of the state is too short to be observed in the INEPT period (10.6 ms) and there will be a loss of coherence resulting in no signal [283, 284]. Only one other residue has a hydrogen exchange rate that exceeds JN H and that is Cys-54 at pH 6.7. A possible mechanism for the observed general loss of signal intensity throughout the protein with increasing pH is the inefficient application of the water “flip-back” pulse prior to acquisition. The flip-back pulse is a selective 90◦ pulse which returns the bulk of the water magnetization to the +z -axis [285]. Any remaining transverse solvent magnetization is then dephased by the gradient pulses [169]. However, for labile amide protons exchanging with

235

the water, this would lead to saturation transfer effects from the dephased water resulting in loss of N-H intensity. As the predicted rates of hydrogen exchange for Tat are highest within the affinity tag and Cys-rich regions, these regions would be most affected by saturation transfer between the exchanging amide protons and water and may account for their rapid disappearance as the pH is raised. A second possible explanation of the loss of signal intensity that cannot be ignored is the possible formation of oxidized protein as the pH is raised and the cysteines are less likely to be in a protonated state. The increased sizes of oxidized multimers would result in significant differences in their molecular rotational correlation times compared to the monomer and changes in the chemical environments of many of the residues. Such a complex mixture (dimers, trimers, tetramers,..., icosamers) would result in many weak signals among those of the monomer that would not likely be detectable unless there were some amount of uniformity in the multimerisation (i.e., all dimers or all trimers etc.) It is worth noting that the intensities of the HSQC resonances in Figure 4.24 are highest for the pH 4.1 sample which is slightly higher than the observed pH for which the hydrogen exchange rate is at a minimum in model compounds. For model compounds in water, the logarithm of the hydrogen exchange rates reaches a minimum at approximately pH 3 (pHmin ) [261, 262]. The value of pHmin reflects the ratio between the acid- and basecatalysed exchange rates [286]. For peptides and proteins, deviations from the pHmin of 3.1 observed for model compounds [262, 287] results from unequal effects on the acid- and basecatalysed rate constants: higher ka and lower kb rates result in (elevated) pHmin > 3 [287]. Such deviations of proteins from model compound behaviour are due to sequence-dependent inductive contributions to exchange as well as electrostatic contributions [262]. In solvent accessible regions of the protein, there may also be local pH effects dependent on the net charge of the protein and the ionic strength of the solution [287]. Additional deviations in pHmin result from protein structure, although those are not likely to be significant in the 236

case of Tat in the present study.

5.8

Disorder Predictions

The disorder predicting algorithms tested with the sequence for His-tagged Tat1−72 all found the sequence to be predominantly disordered. The algorithms for RONN (Fig. 4.31), PONDR (Fig. 4.30), IUPred (Fig. 4.32), and DisProt VL3 (Fig. 4.29(a)) predict some degree of order occurring in the cysteine-rich region of the protein (the width of the ordered segment predicted by these methods varies). The prediction methods are likely weighting the sequence toward order in all of these cases as a result of the presence of several cysteine residues which would be expected to be involved in intramolecular disulfide bond formation or coordinated with a metal ion (zinc finger-like). These prediction algorithms do not allow for forced predictions of the protein in a reduced state and therefore assume that each cysteine has the potential to be involved in a disulfide bond. In general, the prediction algorithms agree with the inferences made from the reduced spectral density mapping about the general lack of restriction in the dynamics and motions of the amide backbone of the protein. Although the sequential assignment of the protein shows that all of the cysteine residues are in the reduced state, the relaxation data for the cysteine residues are limited due to low signal-to-noise. These residues are likely undergoing slow conformational exchange on the NMR timescale and result in several cross-peaks being observed with varying intensity for a single residue. Despite the absence of some cysteine residues from the spectral density mapping, the plots in Figures 4.18 and 4.20 show that the cysteine-rich region has varying degrees of motional freedom, but spectral density values for all residues fall within the ranges observed for other disordered or partially disordered proteins [90, 223, 252, 254, 264, 272, 275–277]. Thus, the predictions for sequence disorder throughout the protein from the DisProt program (Fig. 4.29) more closely agree with the 237

reduced spectral density data and with the predictions from the VL3H and VL3E algorithms showing the best agreement.

238

Chapter 6 Conclusions We have developed both an efficient method for the bacterial over-expression of uniformly labeled with

15

N and

13

C Tat1−72 and a rapid purification protocol based on a hexahistidine

affinity tag purification by metal affinity chromatography. This expression and purification system yields on the order of 20 mg of the uniformly labelled protein per litre of labeling medium. Both MALDI-TOF-MS and NMR spectroscopy have shown that the resulting protein is unambiguously reduced and monomeric in solution at pH 4. This expression system provided sufficient protein for detailed structural and dynamic analysis of Tat [166] and may be used to study its interaction with potential binding partners using heteronuclear NMR methods. This is the first example of a uniformly

15

N/13 C-labeled Tat and paves the

way for a number of potential studies of Tat interactions with host cell proteins and TAR. Intrinsically disordered proteins are classified into the eight categories proposed by Uversky et al. [29] based on their level of disorder (see Fig. 1.1). Measurement of both static and dynamic multinuclear NMR parameters shows that Tat1−72 exists predominantly in a wholly disordered extended (random coil) conformation at pH 4. However, multinuclear NMR has also revealed evidence for multiple backbone conformations mainly, but not

239

exclusively, in the Cys-rich region and core. Possible origins of the minor cross-peaks include: cis-trans proline peptide isomerization, minor Cys oxidation, and multiple conformers in slow equilibrium. The multiplicity of some peaks in the spectra, together with broadened peaks and the changes in peak intensity as a function of pH, suggest that the Cys-rich and core regions form transiently stabilized structures at acidic and neutral pH. The present results are pertinent to Tat’s interactions with intracellular binding partners such as cyclin T1 that are expected to encounter only reduced Tat in the intracellular environment. Cyclin T1 likely recognizes the transiently stabilized structure that forms in the Cys-rich region of Tat. Furthermore, the affinity of Tat for the loop region of TAR is greatly increased by interaction with cyclin T1 suggesting binding-induced folding [288], another feature of intrinsically disordered proteins. Multinuclear NMR is likely to be of value in determining the structures of complexes of Tat and its interaction partners. Finally, there is considerable interest in developing a Tat vaccine [139] based on the presence of Tat antibodies in HIV-1-infected individuals who are long-term non-progressors to AIDS. The antibodies raised against oxidized protein putatively recognize conformational epitopes suggesting that, at neutral pH, parts of the protein exist in a stable conformation. The present dynamics analysis suggests that the most likely region to fold is the Cys-rich region and the formation of disulphide bonds could stabilize local structure there. However, the high positive charge density, lack of hydrophobic residues, and our dynamics analysis suggest that the remainder of the protein is unlikely to form a stable conformation even at neutral pH. Given the results of this research, there are many directions to take future studies. Of key importance is our efficient method to obtain the monomeric and isotopically enriched samples for any number of solution state NMR experiments. One area that needs to be studied is the structural and dynamic behavior of Tat at pH values closer to physiological conditions. The loss of NMR resonance intensity at higher pH values may require the 240

presence of zinc ions to retrieve lost signals corresponding to the Cys-rich region, but also to prevent intermolecular disulfide cross-links. Even in the presence of zinc, the protein is not likely to adopt a stable conformation, but the coordination with zinc may change the rate of conformational exchange within the Cys-rich and core regions where signal intensity was observed to be weak. Zinc-bound Tat may provide a convenient workaround to studying monomeric Tat without resorting to harsh reducing conditions at higher pH. NMR relaxation studies of Tat at higher pH would provide a reference frame for the dynamics of Tat prior to any future binding studies, and provide much needed information on the dynamic behavior of Tat in solution to gain understanding of its role in HIV regulation and other pathogenic effects. In addition to zinc, NMR investigation of Tat bound to a peptide fragment corresponding to the cyclin T1 binding domain would provide detail on the structural and functional aspects of this key regulatory complex. Addition of TAR RNA to the Tat–co-factor complex would provide greater insight into the role of Tat. However, the study of such a large complex will have new difficulties in terms of NMR signal intensity due to the slower tumbling of the complex in solution relative to Tat alone. Another aspect of the aforementioned studies would be the inclusion of the full 101 residue protein. Although the 72 residue protein encoded by the first tat exon appears to govern its transcriptional activities, the remaining 29 residues must be of significance to the virus life cycle since it is conserved in all natural HIV isolates. Thus far, the full length protein has been ignored in structural studies. In addition to providing additional epitopes for developing Tat vaccines across many HIV subtypes, the presence of residues 73-101 may have an important effect on some of the non-transcriptional activities of Tat that need to be addressed.

241

Bibliography [1] Wright, P. E.; Dyson, H. J. J. Mol. Biol. 1999, 293, 321–331. [2] Dunker, A. K.; Lawson, J. D.; Brown, C. J.; Williams, R. M.; Romero, P.; Oh, J. S.; Oldfield, C. J.; Campen, A. M.; Ratliff, C. M.; Hipps, K. W. J. Mol. Graphics Modell. 2001, 19, 26–59. [3] Fischer, E. Ber. Dt. Chem. Ges. 1894, 27, 2985—2993. [4] Lemieux, R. U.; Spohr, U. Adv. Carbohydrate Chem. Biochem. 1994, 50, 1–20. [5] Wu, H. Chinese J. Physiol. 1931, 1, 219–234. [6] Mirsky, A. E.; Pauling, L. Proc. Natl. Acad. Sci. USA 1936, 22, 439–447. [7] Pauling, L.; Corey, R. B.; Branson, H. R. Proc. Natl. Acad. Sci. USA 1951, 37, 205–211. [8] Pauling, L.; Corey, R. B. Proc. Natl. Acad. Sci. USA 1951, 37, 729–740. [9] Pauling, L.; Corey, R. B. Proc. Natl. Acad. Sci. USA 1951, 37, 251–256. [10] Kauzmann, W. Adv. Protein Chem. 1959, 14, 1–63. [11] Tanford, C. Protein Sci. 1997, 6, 1358–1366.

242

[12] Kendrew, J. C.; Dickerson, R. E.; Strandberg, B. E.; Hart, R. G.; Davies, D. R.; Phillips, D. C.; Shore, V. C. Nature 1960, 185, 422–427. [13] Blake, C. C. F.; Koenig, D. F.; Mair, G. A.; North, A. C. T.; Phillips, D. C.; Sarma, V. R. Nature 1965, 206, 757–761. [14] Watson, J. D.; Crick, F. H. C. Nature 1953, 171, 737–738. [15] Watson, J. D.; Crick, F. H. C. Nature 1953, 171, 964–967. [16] Karush, F. J. Am. Chem. Soc. 1950, 72, 2705–2713. [17] Koshland Jr., D. E. Proc. Natl. Acad. Sci. USA 1958, 44, 98–104. [18] Bennett, W. S.; Steitz, T. A. Proc. Natl. Acad. Sci. USA 1978, 75, 4848–4852. [19] Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res. 2000, 28, 235–242. [20] Bloomer, A. C.; Champness, J. N.; Bricogne, G.; Staden, R.; Klug, A. Nature 1978, 276, 362–368. [21] Bode, W.; Schwager, P.; Huber, R. J. Mol. Biol. 1978, 118, 99–112. [22] Aviles, F. J.; Chapman, G. E.; Kneale, G. G.; Crane-Robinsom, C.; Bradbury, E. M. Eur. J. Biochem. 1978, 88, 363–371. [23] Kriwacki, R. W.; Hengst, L.; Tennant, L.; Reed, S. I.; Wright, P. E. Proc. Natl. Acad. Sci. USA 1996, 93, 11504–11509. [24] Daughdrill, G. W.; Chadsey, M. S.; Karlinsey, J. E.; Hughes, K. T.; Dahlquist, F. W. Nat. Struct. Biol. 1997, 4, 285–291. [25] Fletcher, C. M.; Wagner, G. Protein Sci. 1998, 7, 1639–1642. 243

[26] Shortle, D. Adv. Protein Chem. 2002, 62, 1–23. [27] Bracken, C. J. Mol. Graphics Modell. 2001, 19, 3–12. [28] Schweers, O.; Schonbrunn-Hanebeck, E.; Marx, A.; Mandelkow, E. J. Biol. Chem. 1994, 269, 24290–24297. [29] Uversky, V. N.; Oldfield, C. J.; Dunker, A. K. J. Mol. Recognit. 2005, 18, 343–384. [30] Holt, C.; Sawyer, L. J. Chem. Soc. Faraday Trans. 1993, 89, 2683–2692. [31] Weinreb, P. H.; Zhen, W.; Poon, A. W.; Conway, K. A.; Lansbury, P. T. Biochemistry 1996, 35, 13709–13715. [32] Ohgushi, M.; Wada, A. FEBS Lett. 1983, 164, 21–24. [33] Creighton, T. E. Proc. Natl. Acad. Sci. USA 1988, 85, 5082–5086. [34] Jackson, S.; Fersht, A. R. Biochemistry 1991, 30, 10428–10435. [35] Zwanzig, R. Proc. Natl. Acad. Sci. USA 1997, 94, 148–150. [36] Ptitsyn, O. B.; Uversky, V. N. FEBS Lett. 1994, 341, 15–18. [37] Uversky, V. N.; Ptitsyn, O. B. Biochemistry 1994, 33, 2782–2791. [38] Uversky, V. N.; Ptitsyn, O. B. J. Mol. Biol. 1996, 255, 215–228. [39] Dunker, A. K.; Obradovic, Z. Nat. Biotech. 2001, 19, 805–806. [40] Uversky, V. N. Protein Sci. 2002, 11, 739–756. [41] Radivojac, P.; Iakoucheva, L. M.; Oldfield, C. J.; Obradovic, Z.; Uversky, V. N.; Dunker, A. K. Biophys. J. 2007, 92, 1439–1456. [42] Tompa, P. Trends Biochem. Sci. 2002, 27, 527–533. 244

[43] Tompa, P. FEBS Lett. 2005, 579, 3346–3354. [44] Tompa, P.; Csermely, P. FASEB J. 2004, 18, 1169–1175. [45] Hoh, J. H. Proteins 1998, 32, 223–228. [46] Rout, M. P.; Aitchison, J. D.; Magnasco, M. O.; Chait, B. T. Trends Cell Biol. 2003, 13, 622–628. [47] Kim, T.-A.; Avraham, H. K.; Koh, Y.-H.; Jiang, S.; Park, I.-W.; Avraham, S. J. Immunol. 2003, 170, 2629–2637. [48] Denning, D. P.; Patel, S. S.; Uversky, V.; Fink, A. L.; Rexach, M. Proc. Natl. Acad. Sci. USA 2003, 100, 2450–2455. [49] Brown, H. G.; Hoh, J. H. Biochemistry 1997, 36, 15035–15040. [50] Mukhopadhyay, R.; Kumar, S.; Hoh, J. H. BioEssays 2004, 26, 1017–1025. [51] Tompa, P.; Szasz, C.; Buday, L. Trends Biochem. Sci. 2005, 30, 484–489. [52] Radivojac, P.;

Vucetic, S.;

O’Connor, T. R.;

Uversky, V. N.;

Obradovic, Z.;

Dunker, A. K. Proteins 2006, 63, 398–410. [53] Bussell, Robert, J.; Eliezer, D. J. Biol. Chem. 2001, 276, 45996–46003. [54] Dosztanyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. Bioinformatics 2005, 21, 3433– 3434. [55] Dosztanyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. J. Mol. Biol. 2005, 347, 827–839. [56] Li, X.; Romero, P.; Rani, M.; Dunker, A. K.; Obradovic, Z. Genome Inform. 1999, 10, 30–40.

245

[57] Romero, P.; Obradovic, Z.; Li, X.; Garner, E.; Brown, C.; Dunker, A. K. Proteins 2001, 42, 38–48. [58] Romero, P.; Obradovic, Z.; Dunker, A. K. Genome Inform. 1997, 8, 110–124. [59] Obradovic, Z.; Peng, K.; Vucetic, S.; Radivojac, P.; Brown, C. J.; Dunker, A. K. Proteins 2003, 53(S6), 566–72. [60] Peng, K.; Vucetic, S.; Radivojac, P.; Brown, C. J.; Dunker, A. K.; Z., O. J. Bioinform. Comput. Biol. 2005, 3, 35–60. [61] Yang, Z. R.; Thomson, R.; McMeil, P.; Esnouf, R. M. Bioinformatics 2005, 21, 3369–3376. [62] Dyson, H. J.; Wright, P. E. Nat. Rev. Mol. Cell Biol. 2005, 6, 197–208. [63] Ward, J. J.; Sodhi, J. S.; McGuffin, L. J.; Buxton, B. F.; Jones, D. T. J. Mol. Biol. 2004, 337, 635–645. [64] Uversky, V. N.; Gillespie, J. R.; Fink, A. L. Proteins 2000, 41, 415–427. [65] Linding, R.; Jensen, L. J.; Diella, F.; Bork, P.; Gibson1, T. J.; Russell, R. B. Structure 2003, 11, 1453–1459. [66] Jones, D. T.; Ward, J. J. Proteins 2003, 53, 573–578. [67] Ward, J. J.; McGuffin, L. J.; Bryson, K.; Buxton, B. F.; Jones, D. T. Bioinformatics 2004, 20, 2138–2139. [68] Prilusky, J.; Felder, C. E.; Zeev-Ben-Mordehai, T.; Rydberg, E. H.; Man, O.; Beckmann, J. S.; Silman, I.; Sussman, J. L. Bioinformatics 2005, 21, 3435–3438. [69] Linding, R.; Russell, R. B.; Neduva, V.; Gibson, T. J. Nucleic Acids Res. 2003, 31, 3701–3708. 246

[70] Liu, J.; Rost, B. Nucleic Acids Res. 2003, 31, 3833–3835. [71] Dunker, A. K.; Garner, E.;

Guilliot, S.; Romero, P.; Albrecht, K.; Hart, J.;

Obradovic, Z.; Kissinger, C.; Villafranca, J. E. Pac. Symp. Biocomput. 1998, 3, 471–482. [72] Coeytaux, K.; Poupon, A. Bioinformatics 2005, 21, 1891–1900. [73] Wootton, J. C. Comput. Chem. 1994, 18, 269–285. [74] Vucetic, S.; Obradovic, Z.; Vacic, V.; Radivojac, P.; Peng, K.; Iakoucheva, L. M.; Cortese, M. S.;

Lawson, J. D.;

Brown, C. J.;

Sikes, J. G.;

Newton, C. D.;

Dunker, A. K. Bioinformatics 2005, 21, 137–140. [75] Greenfield, N. J. Nat. Prot. 2007, 1, 2876–2890. [76] Price, N. C. Biotechnol. Appl. Biochem. 2000, 31, 29–40. [77] Receveur-Bréchot, V.; Bourhis, J.-M.; Uversky, V. N.; Canard, B.; Longhi, S. Proteins 2006, 62, 24–45. [78] Tsai, C.-J.; de Laureto, P. P.; Fontana, A.; Nussinov, R. Protein Sci. 2002, 11, 1753–1770. [79] Fontana, A.; de Laureto, P. P.; Spolaore, B.; Frare, E.; Picotti, P.; Zambonin, M. Acta Biochim. Pol. 2004, 59, 299–321. [80] Svergun, D. I.; Koch, M. H. J. Curr. Opin. Struct. Biol. 2002, 12, 654–660. [81] Longhi, S.; Receveur-Brechot, V.; Karlin, D.; Johansson, K.; Darbon, H.; Bhella, D.; Yeo, R.; Finet, S.; Canard, B. J. Biol. Chem. 2003, 278, 18638–18648. [82] Vachette, P.; Svergun, D. Small-angle X-ray scattering by solutions of biological macromolecules. In Structure and Dynamics of Biomolecules; Fanchon, E.; Geissler, E.; 247

Hodeau, J.-L.; Regnard, J.-R.; Timmins, P. A., Eds.; Oxford University Press: New York, 2000. [83] Svergun, D. I.; Koch, M. H. J. Rep. Prog. Phys. 2003, 66, 1735–1782. [84] Lipfert, J.; Doniach, S. Annu. Rev. Biophys Biomol. Struct. 2007, 36, 307–327. [85] Koch, M. H. J.; Vachette, P.; Svergun, D. I. Q. Rev. Biophys. 2003, 36, 147–227. [86] Doniach, S. Chem. Rev. 2001, 101, 1763–1778. [87] Wüthrich, K. Angew. Chem. Int. Ed. Engl. 2003, 42, 3340–3363. [88] Chatterjee, A.; Kumar, A.; Chugh, J.; Srivastava, S.; Bhavesh, N. S.; Hosur, R. V. J. Chem. Sci. 2005, 117, 3-21. [89] Dyson, H. J.; Wright, P. E. Chem. Rev. 2004, 104, 3607–3622. [90] Farrow, N. A.; Zhang, O.; Forman-Kay, J. D.; Kay, L. E. Biochemistry 1997, 36, 2390–2402. [91] Palmer, A. G. Chem. Rev. 2004, 104, 3623–3640. [92] Mittag, T.; Forman-Kay, J. D. Curr. Opin. Struct. Biol. 2007, 17, 3–14. [93] Barre-Sinoussi, F.; Chermann, J. C.; Rey, F.; Nugeyre, M. T.; Chamaret, S.; Gruest, J.;

Dauguet, C.;

Axler-Blin, C.;

Vezinet-Brun, F.;

Rouzioux, C.;

Rozenbaum, W.; Montagnier, L. Science 1983, 220, 868–871. [94] Popovic, M.; Sarngadharan, M. G.; Read, E.; Gallo, R. C. Science 1984, 224, 497–500. [95] Coffin, J.; Haase, A.; Levy, J. A.; Montagnier, L.; Oroszlan, S.; Teich, N.; Temin, H.; Toyoshima, K.; Varmus, H.; Vogt, P.; Weiss, R. A. Nature 1986, 321, 10. 248

[96] Turner, B. G.; Summers, M. F. J. Mol. Biol. 1999, 285, 1–32. [97] Cullen, B. FASEB J. 1991, 5, 2361–2368. [98] Kingsman, S. M.; Kingsman, A. J. Eur. J. Biochem. 1996, 240, 491–507. [99] Frankel, A. D.; Young, J. A. T. Annu. Rev. Biochem. 1998, 67, 1–25. [100] Kwong, P. D.; Wyatt, R.; Robinson, J.; Sweet, R. W.; Sodroski, J.; Hendrickson, W. A. Nature 1998, 393, 648–659. [101] Zwick, M. B.; Saphire, E. O.; Burton, D. R. Nat. Med. 2004, 10, 133–134. [102] Garzon, M. T.;

Lidon-Moya, M. C.;

Barrera, F. N.;

Prieto, A.;

Gomez, J.;

Mateu, M. G.; Neira, J. L. Protein Sci. 2004, 13, 1512–1523. [103] Haseltine, W. FASEB J. 1991, 5, 2349–2360. [104] Aiken, C.; Konner, J.; Landau, N. R.; Lenburg, M. E.; Trono, D. Cell 1994, 76, 853–864. [105] Karn, J. J. Mol. Biol. 1999, 293, 235–254. [106] Freed, E. O. Somat. Cell Mol. Genet. 2001, 26, 13–33. [107] Liang, C.; Wainberg, M. A. AIDS Rev. 2002, 4, 41–49. [108] Ensoli, B.; Barillari, G.; Salahuddin, S. Z.; Gallo, R. C.; Wong-Staal, F. Nature 1990, 345, 84–6. [109] Albini, A.;

Benelli, R.;

Presta, M.;

Rusnati, M.;

Ziche, M.;

Rubartelli, A.;

Paglialunga, G.; Bussolino, F.; Noonan, D. Oncogene 1996, 12, 289–297.

249

[110] Albini, A.;

Soldi, R.;

Giunciuglio, D.;

Giraudo, E.;

Benelli, R.;

Primo, L.;

Noonan, D.; Salio, M.; Camussi, G.; Rockl, W.; Bussolino, F. Nat. Med. 1996, 2, 1371–1375. [111] Goldstein, G. Nat. Med. 1996, 2, 960–964. [112] Nath, A.; Psooy, K.; Martin, C.; Knudsen, B.; Magnuson, D. S.; Haughey, N.; Geiger, J. D. J. Virol. 1996, 70, 1475–1480. [113] Pocernich, C. B.; Sultana, R.; Mohmmad-Abdul, H.; Nath, A.; Butterfield, D. A. Brain. Res. Rev. 2005, 50, 14–26. [114] András, I. E.; Pu, H.; Deli, M. A.; Nath, A.; Hennig, B.; Toborek, M. J. Neurosci. Res. 2003, 74, 255-265. [115] Banks, W. A.; Robinson, S. M.; Nath, A. Exp. Neurol. 2005, 193, 218–227. [116] Westendorp, M. O.; Shatrov, V. A.; Schulze-Osthoff, K.; Frank, R.; Kraft, M.; Los, M.; Krammer, P. H.; Droge, W.; Lehmann, V. EMBO J. 1995, 14, 546–554. [117] Pumfery, A.; Deng, L.; Maddukuri, A.; de la Fuente, C.; Li, H.; Wade, J. D.; Lambert, P.; Kumar, A.; Kashanchi, F. Curr. HIV Res. 2003, 1, 343–362. [118] Guo, X.; Kameoka, M.; Wei, X.; Roques, B.; Gotte, M.; Liang, C.; Wainberg, M. A. Virology 2003, 307, 154–163. [119] Lassen, K.; Han, Y.; Zhou, Y.; Siliciano, J.; Siliciano, R. F. Trends Mol. Med. 2004, 10, 525–531. [120] Kaul, M.; Garden, G. A.; Lipton, S. A. Nature 2001, 410, 988–994. [121] King, J. E.; Eugenin, E. A.; Buckner, C. M.; Berman, J. W. Microbes Infect. 2006, 8, 1347–1357. 250

[122] Toborek, M.; Lee, Y. W.; Flora, G.; Pu, H.; András, I. E.; Wylegala, E.; Hennig, B.; Nath, A. Cell. Mol. Neurobiol. 2005, 25, 181–199. [123] Nath, A.; Geiger, J. Prog. Neurobiol. 1998, 54, 19–33. [124] Vendel, A. C.; Lumb, K. J. Biochemistry 2003, 42, 910–916. [125] Derse, D.; Carvalho, M.; Carroll, R.; Peterlin, B. M. J. Virol. 1991, 65, 7012–7015. [126] Jeang, K.-T.; Xiao, H.; Rich, E. A. J. Biol. Chem. 1999, 274, 28837–28840. [127] Kuppuswamy, M.; Subramanian, T.; Srinivasan, A.; Chinnadurai, G. Nucleic Acids Res. 1989, 17, 3551–3561. [128] Garcia, J. A.; Harrich, D.; Pearson, L.; Mitsuyasu, R.; Gaynor, R. B. EMBO J. 1988, 7, 3143–3147. [129] Smith, S. M.; Pentlicky, S.; Klase, Z.; Singh, M.; Neuveut, C.; Lu, C. Y.; Reitz, M. S.; Yarchoan, R.; Marx, P. A.; Jeang, K. T. J. Biol. Chem. 2003, 278, 44816–44825. [130] Bieniasz, P. D.; Grdina, T. A.; Bogerd, H. P.; Cullen, B. R. EMBO J. 1998, 17, 7056–7065. [131] Chen, D.; Wang, M.; Zhou, S.; Zhou, Q. EMBO J. 2002, 21, 6801–6810. [132] Weeks, K. M.; Ampe, C.; Schultz, S. C.; Steitz, T. A.; Crothers, D. M. Science 1990, 249, 1281–1285. [133] Gupta, B.; Levchenko, T. S.; Torchilin, V. P. Adv. Drug Deliv, Rev. 2005, 57, 637– 651. [134] Campbell, G. R.;

Pasquier, E.;

Watkins, J.;

Bourgarel-Rey, V.;

Peyrot, V.;

Esquieu, D.; Barbier, P.; de Mareuil, J.; Braguer, D.; Kaleebu, P.; Yirrell, D. L.; Loret, E. P. J. Biol. Chem. 2004, 279, 48197–48204. 251

[135] Avraham, H. K.; Jiang, S.; Lee, T. H.; Prakash, O.; Avraham, S. J. Immunol. 2004, 173, 6228–6233. [136] Weissman, J. D.; Brown, J. A.; Howcroft, T. K.; Hwang, J.; Chawla, A.; Roche, P. A.; Schiltz, L.; Nakatani, Y.; Singer, D. S. Proc. Natl. Acad. Sci. USA 1998, 95, 11601– 11606. [137] Carroll, I. R.; Wang, J.; Howcroft, T. K.; Singer, D. S. Mol. Immunol. 1998, 35, 1171–1178. [138] Howcroft, T.; Strebel, K.; Martin, M.; Singer, D. Science 1993, 260, 1320–1322. [139] Opi, S.; Péloponèse, J.-M.; Esquieu, D.; Watkins, J.; Campbell, G.; De Mareuil, J.; Jeang, K. T.; Yirrell, D. L.; Kaleebu, P.; Loret, E. P. Vaccine 2004, 22, 3105–3111. [140] Jeang, K.-T. HIV-1 Tat: Structure and Function. In Human Retroviruses and AIDS 1996: A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences; Myers, G.; Korber, B. T.; Foley, B. T.; Jeang, K.-T.; Mellors, J. W.; WainHobson, S., Eds.; Los Alamos National Laboratories: Los Alamos, 1996. [141] Neuveut, C.; Jeang, K. T. J. Virol. 1996, 70, 5572–5581. [142] Wu, Y.; Marsh, J. W. Microbes Infect. 2003, 5, 1023–1027. [143] Berkhout, B.; Silverman, R. H.; Jeang, K. T. Cell 1989, 59, 273–82. [144] Yamaguchi, Y.;

Takagi, T.;

Wada, T.;

Yano, K.;

Furuya, A.;

Sugimoto, S.;

Hasegawa, J.; Handa, H. Cell 1999, 97, 41–51. [145] Bourgeois, C. F.; Kim, Y. K.; Churcher, M. J.; West, M. J.; Karn, J. Mol. Cell. Biol. 2002, 22, 1079–1093.

252

[146] Kim, Y. K.; Bourgeois, C. F.; Isel, C.; Churcher, M. J.; Karn, J. Mol. Cell. Biol. 2002, 22, 4622–4637. [147] Schulte, A.;

Czudnochowski, N.;

Barboric, M.;

Schonichen, A.;

Blazek, D.;

Peterlin, B. M.; Geyer, M. J. Biol. Chem. 2005, 280, 24968–24977. [148] Bannwarth, S.; Gatignol, A. Curr. HIV Res. 2005, 3, 61–71. [149] Mujtaba, S.; He, Y.; Zeng, L.; Farooq, A.; Carlson, J. E.; Ott, M.; Verdin, E.; Zhou, M. M. Mol. Cell. Biol. 2002, 9, 575–586. [150] Dingwall, C.;

Ernberg, I.;

Gait, M. J.;

Green, S. M.;

Heaphy, S.;

Karn, J.;

Lowe, A. D.; Singh, M.; Skinner, M. A. EMBO J. 1990, 9, 4145–4153. [151] Pritchard, C. E.; Grasby, J. A.; Hamy, F.; Zacharek, A. M.; Singh, M.; Karn, J.; Gait, M. J. Nucleic Acids Res. 1994, 22, 2592–2600. [152] Aboul-ela, F.; Karn, J.; Varani, G. J. Mol. Biol. 1995, 253, 313–332. [153] Churcher, M. J.; Lamont, C.; Hamy, F.; Dingwall, C.; Green, S. M.; Lowe, A. D.; Butler, J. G.; Gait, M. J.; Karn, J. J. Mol. Biol. 1993, 230, 90–110. [154] Rana, T. M.; Jeang, K. T. Arch. Biochem. Biophys. 1999, 365, 175–85. [155] Bayer, P.; Kraft, M.; Ejchart, A.; Westendorp, M.; Frank, R.; Rosch, P. J. Mol. Biol. 1995, 247, 529–535. [156] Gregoire, C.; Péloponèse, J.-M.; Esquieu, D.; Opi, S.; Campbell, G.; Solomiac, M.; Lebrun, E.; Lebreton, J.; Loret, E. P. Biopolymers 2001, 62, 324–335. [157] Peloponese, J.-M. et al. C.R. Accad. Sci., Ser. III 2000, 323, 883–894. [158] Freund, J.; Vertesy, L.; Koller, K. P.; Wolber, V.; Heintz, D.; Kalbitzer, H. R. J. Mol. Biol. 1995, 250, 672–688. 253

[159] Puglisi, J. D.; Tan, R.; Calnan, B. J.; Frankel, A. D.; Williamson, J. R. Science 1992, 257, 76–80. [160] Long, K. S.; Crothers, D. M. Biochemistry 1999, 38, 10059–10069. [161] Seewald, M. J.; Metzger, A. U.; Willbold, D.; Rosch, P.; Sticht, H. J. Biomol. Struct. Dyn. 1998, 16, 683–692. [162] Metzger, A. U.; Bayer, P.; Willbold, D.; Hoffmann, S.; Frank, R. W.; Goody, R. S.; Rosch, P. Biochem. Biophys. Res. Commun. 1997, 241, 31–36. [163] Greenbaum, N. L. Structure 1996, 4, 5–9. [164] Hakansson, S.; Caffrey, M. Biochemistry 2003, 42, 8999–9006. [165] Gregoire, C. J.; Loret, E. P. J. Biol. Chem. 1996, 271, 22641–22646. [166] Shojania, S.; O’Neil, J. D. J. Biol. Chem. 2006, 281, 8347–8356. [167] Abragam, A. The Principles of Nuclear Magnetism; Clarendon Press: Oxford, 1961. [168] Ernst, R. R.; Bodenhausen, G.; Wokaun, A. Principles of Nuclear Magnetic Resonance in One and Two Dimensions; Clarendon Press: Oxford, 7th ed.; 2003. [169] Cavanagh, J.; Fairbrother, W. J.; Palmer, A. G.; Skelton, N. J. Protein NMR Spectroscopy: Principles and Practice; Academic Press: San Diego, 1996. [170] Goldman, M. Quantum Description of High Resolution NMR in Liquids; Oxford University Press: New York, 1991. [171] Neuhaus, D.; Williamson, M. P. The Nuclear Overhauser Effect in Structural and Conformational Analysis; John Wiley and Sons: New York, 2nd ed.; 2000.

254

[172] Seaborn, J. B. Hypergeometric Functions and Their Applications; Springer-Verlag: London, 1991. [173] McQuarrie, D. A. Quantum Chemistry; University Science Books: Mill Valley, 1983. [174] Harris, R. K. Nuclear Magnetic Resonance Spectroscopy; Longman: London, 1986. [175] Lipari, G.; Szabo, A. J. Am. Chem. Soc. 1982, 104, 4546–4559. [176] Lipari, G.; Szabo, A. J. Am. Chem. Soc. 1982, 104, 4559–4570. [177] Kay, L. E.; Torchia, D. A.; Bax, A. Biochemistry 1989, 28, 8972–8979. [178] Farrow, N. A.; Muhandiram, R.; Singer, A. U.; Pascal, S. M.; Kay, C. M.; Gish, G.; Shoelson, S. E.; Pawson, T.; Forman-Kay, J. D.; Kay, L. E. Biochemistry 1994, 33, 5984–6003. [179] Orekhov, V. Y.; Pervushin, K. V.; Korzhnev, D. M.; Arseniev, A. S. J. Biomol. NMR 1995, 6, 113–122. [180] Peng, J. W.; Thanbal, V.; Wagner, G. J. Magn. Reson. 1991, 94, 82–100. [181] Peng, J. W.; Wagner, G. Biochemistry 1992, 31, 8571–8586. [182] Luginbuhl, P.; Wüthrich, K. Prog. Nucl. Magn. Reson. Spectrosc. 2002, 40, 199–247. [183] Levitt, M. H. Spin Dynamics: Basics of Nuclear Magnetic Resonance; John Wiley and Sons, Ltd.: West Sussex, England, 1st ed.; 2001. [184] Korzhnev, D. M.; Billeterc, M.; Arsenievb, A. S.; Orekhov, V. Y. Prog. Nucl. Magn. Reson. Spectrosc. 2001, 38, 197–266. [185] Kelly, S. W.; Sholl, C. A. J. Phys.: Condens. Matter 1992, 4, 3317–3330. [186] Frankel, A. D.; Pabo, C. O. Cell 1988, 55, 1189–1193. 255

[187] Sambrook, J.; Russell, D. W. Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 3rd ed.; 2001. [188] Marley, J.; Lu, M.; Bracken, C. J. Biomol. NMR 2001, 20, 71–75. [189] Neidhardt, F. C.; Bloch, P. L.; Smith, D. F. J. Bacteriol. 1974, 119, 736–747. [190] Kay, L. E.; Keifer, P.; Saarinen, T. J. Am. Chem. Soc. 1992, 114, 10663–10665. [191] Delaglio, F.; Grzesiek, S.; Vuister, G. W.; Zhu, G.; Pfeifer, J.; Bax, A. J. Biomol. NMR 1995, 6, 277–293. [192] Shaka, A. J.; Keeler, J.; Frenkiel, T.; Freeman, R. J. Magn. Reson. 1983, 52, 335–338. [193] Wishart, D. S.; Bigam, C. G.; Yao, J.; Abildgaard, F.; Dyson, H. J.; Oldfield, E.; Markley, J. L.; Sykes, B. D. J. Biomol. NMR 1995, 6, 135–140. [194] Wittekind, M.; Mueller, L. J. Magn. Reson., Ser. B 1993, 101, 201–205. [195] Grzesiek, S.; Bax, A. J. Am. Chem. Soc. 1992, 114, 6291–6293. [196] Ikura, M.; Kay, L. E.; Bax, A. Biochemistry 1990, 29, 4659–4667. [197] Yamazaki, T.;

Lee, W.;

Revington, M.;

Mattiello, D. L.;

Dahlquist, F. W.;

Arrowsmith, C. H.; Kay, L. E. J. Am. Chem. Soc. 1994, 116, 6464–6465. [198] Vuister, G. W.; Bax, A. J. Am. Chem. Soc. 1993, 115, 7772–7777. [199] Muhandiram, D. R.; Kay, L. E. J. Magn. Reson., Ser. B 1994, 103, 203–216. [200] Kay, L. E.; Xu, G. Y.; Yamazaki, T. J. Magn. Reson., Ser. A 1994, 109, 129–133. [201] Kay, L. E.; Nicholson, L. K.; Delaglio, F.; Bax, A.; Torchia, D. A. J. Magn. Reson. 1992, 97, 359–375. 256

[202] Habazettl, J.; Myers, L. C.; Yuan, F.; Verdine, G. L.; Wagner, G. Biochemistry 1996, 35, 9335–9348. [203] Schwarzinger, S.; Kroon, G. J.; Foss, T. R.; Chung, J.; Wright, P. E.; Dyson, H. J. J. Am. Chem. Soc. 2001, 123, 2970–2978. [204] Schwarzinger, S.; Kroon, G. J.; Foss, T. R.; Wright, P. E.; Dyson, H. J. J. Biomol. NMR 2000, 18, 43–48. [205] Penkett, C. J.;

Redfield, C.;

Dodd, I.;

Hubbard, J.;

McBay, D. L.;

Mossakowska, D. E.; Smith, R. A. G.; Dobson, C. M.; Smith, L. J. J. Mol. Biol. 1997, 274, 152–159. [206] Palmer, A. G. Annu. Rev. Biophys Biomol. Struct. 2001, 30, 129–155. [207] Peng, J. W.; Wagner, G. J. Magn. Reson. 1992, 98, 308–332. [208] Farrow, N. A.; Zhang, O.; Forman-Kay, J. D.; Kay, L. E. Biochemistry 1995, 34, 868–878. [209] Farrow, N. A.; Zhang, O.; Szabo, A.; Torchia, D. A.; Kay, L. E. J. Biomol. NMR 1995, 6, 153–162. [210] Jarymowycz, V. A.; Stone, M. J. Chem. Rev. 2006, 106, 1624–1671. [211] Schwalbe, H.; Fiebig, K. M.; Buck, M.; Jones, J. A.; Grimshaw, S. B.; Spencer, A.; Glaser, S. J.; Smith, L. J.; Dobson, C. M. Biochemistry 1997, 36, 8977–8991. [212] Szyperski, T.; Luginbuhl, P.; Otting, G.; Guntert, P.; W¨ uthrich, K. J. Biomol. NMR 1993, 3, 151–164. [213] Lefevre, J. F.; Dayie, K. T.; Peng, J. W.; Wagner, G. Biochemistry 1996, 35, 2674–2686. 257

[214] Palmer, A. G.; Rance, M.; Wright, P. E. J. Am. Chem. Soc. 1991, 113, 4371–4380. [215] Mandel, A. M.; Akke, M.; Palmer, A. G. J. Mol. Biol. 1995, 246, 144–163. [216] Spyracopoulos, L. J. Biomol. NMR 2006, 36, 215–224. [217] Wolfram Research, Inc., Mathematica; Version 5.0 Wolfram Research, Inc.: Champaign, IL, 2004. [218] Creighton, T. E. Proteins : structures and molecular properties; W.H. Freeman: New York, 2nd ed ed.; 1993. [219] Andrec, M.; Montelione, G. T.; Levy, R. M. J. Magn. Reson. 1999, 139, 408–421. [220] Schurr, J. M.; Babcock, H. P.; Fujimoto, B. S. J. Magn. Reson., Ser. B 1994, 105, 211–224. [221] Clore, G. M.; Szabo, A.; Bax, A.; Kay, L. E.; Driscoll, P. C.; Gronenborn, A. M. J. Am. Chem. Soc. 1990, 112, 4989–4991. [222] Cole, K. S.; Cole, R. H. J. Chem. Phys. 1941, 9, 341–351. [223] Buevich, A. V.; Shinde, U. P.; Inouye, M.; Baum, J. J. Biomol. NMR 2001, 20, 233–249. [224] Buevich, A. V.; Baum, J. J. Am. Chem. Soc. 1999, 121, 8671–8672. [225] Ochsenbein, F.; Neumann, J.-M.; Guittet, E.; Heijenoort, C. V. Protein Sci. 2002, 11, 957–964. [226] Peng, J. W.; Wagner, G. Biochemistry 1995, 34, 16733–16752. [227] Laskowski, R. A.; MacArthurt, M. W.; Thornton, J. M. Curr. Opin. Struct. Biol. 1998, 8, 631–639. 258

[228] d’Auvergne, E. J.; Gooley, P. R. J. Biomol. NMR 2003, 25, 25–39. [229] Ochsenbein, F.;

Guerois, R.;

Neumann, J.-M.;

Sanson, A.;

Guittet, E.;

van

Heijenoort, C. J. Biomol. NMR 2001, 19, 3–18. [230] Bai, Y.; Milne, J. S.; Mayne, L.; Englander, S. W. Proteins 1993, 17, 75–86. [231] Gill, S. C.; von Hippel, P. H. Anal. Biochem. 1989, 182, 319–326. [232] Harris, J. L.; Backes, B. J.; Leonetti, F.; Mahrus, S.; Ellman, J. A.; Craik, C. S. Proc. Natl. Acad. Sci. USA 2000, 97, 7754–7759. [233] Hasan, A. A. K.; Amenta, S.; Schmaier, A. H. Circulation 1996, 94, 517–528. [234] Edwards, A. M.; Arrowsmith, C. H.; Christendat, D.; Dharamsi, A.; Friesen, J. D.; Greenblatt, J. F.; Vedadi, M. Nat. Struct. Biol. 2000, 7, 970–972. [235] Vertes, A.;

Benscura, A.;

Sadeghi, M.;

Wu, X. Adduct formation and energy

redistribution in UV and IR matrix-assisted laser desorption. In Proceedings of the Society of Photo-optical Instrumentation Engineers: Laser plasma generation and diagnostics, Vol. 3935; Haglund, R. F.; Wood, R. F., Eds.; SPIE: Bellingham, 2000. [236] Peti, W.; Smith, L. J.; Redfield, C.; Schwalbe, H. J. Biomol. NMR 2001, 19, 153–165. [237] Dyson, H. J.; Wright, P. E. Nat. Struct. Biol. 1998, 5, 499–503. [238] Yao, J.; Dyson, H. J.; Wright, P. E. FEBS Lett. 1997, 419, 285–289. [239] Zhang, O.; Forman-Kay, J. D.; Shortle, D.; Kay, L. E. J. Biomol. NMR 1997, 9, 181–200. [240] Wishart, D. S.; Sykes, B. D.; Richards, F. M. J. Mol. Biol. 1991, 222, 311–333. [241] Wishart, D. S.; Sykes, B. D. J. Biomol. NMR 1994, 4, 171–180. 259

[242] Frank, M. K.; Clore, G. M.; Gronenborn, A. M. Protein Sci. 1995, 4, 2605–2615. [243] Zhang, O.; Forman-Kay, J. D. Biochemistry 1995, 34, 6784–6794. [244] Smith, L. J.; Bolin, K. A.; Schwalbe, H.; MacArthur, M. W.; Thornton, J. M.; Dobson, C. M. J. Mol. Biol. 1996, 255, 494–506. [245] Zhang, H.;

Leung, A.;

Wishart, D. “The THRIFTY web server, version 1.0”,

http://redpoll.pharmacy.ualberta.ca/thrifty/, University of Alberta, Edmonton, 2005. [246] DeLano, W. “MacPyMOL: A PyMOL-based Molecular Graphics Application for MacOS X”, http://www.pymol.org, DeLano Scientific LLC, San Francisco, 2005. [247] Hu, Y.; Macinnis, J. M.; Cherayil, B. J.; Fleming, G. R.; Freed, K. F.; Perico, A. J. Chem. Phys. 1990, 93, 822–836. [248] Ulrich, D. L.; Kojetin, D.; Bassler, B. L.; Cavanagh, J.; Loria, J. P. J. Mol. Biol. 2005, 347, 297–307. [249] Thormann, T.; Soroka, V.; Nielbo, S.; Berezin, V.; Bock, E.; Poulsen, F. M. Biochemistry 2004, 43, 10364–10369. [250] Bhavesh, N. S.; Sinha, R.; Mohan, P. M.; Hosur, R. V. J. Biol. Chem. 2003, 278, 19980–19985. [251] Otzen, D. E.; Miron, S.; Akke, M.; Oliveberg, M. Biochemistry 2004, 43, 12964– 12978. [252] Schwarzinger, S.; Wright, P. E.; Dyson, H. J. Biochemistry 2002, 41, 12681–12686. [253] Platt, G. W.; McParland, V. J.; Kalverda, A. P.; Homans, S. W.; Radford, S. E. J. Mol. Biol. 2005, 346, 279–294.

260

[254] Yao, J.; Chung, J.; Eliezer, D.; Wright, P. E.; Dyson, H. J. Biochemistry 2001, 40, 3561–71. [255] Redfield, C. Methods 2004, 34, 121–132. [256] Magnuson, D. S. K.; Knudsen, B. E.; Geiger, J. D.; Brownstone, R. M.; Nath, A. Ann. Neurol. 1995, 37, 373–380. [257] Hakansson, S.; Jacobs, A.; Caffrey, M. Protein Sci. 2001, 10, 2138–2139. [258] Studier, F. W.; Rosenberg, A. H.; Dunn, J. J.; Dubendorff, J. W. Methods Enzymol. 1990, 185, 60–89. [259] Han, J. C.; Han, G. Y. Anal. Biochem. 1994, 220, 5–10. [260] Gough, J. D.; R. H., J. W.; Donofrio, A. E.; Lees, W. J. J. Am. Chem. Soc. 2002, 124, 3885–3892. [261] Wüthrich, K. NMR of proteins and nucleic acids; Wiley: New York, 1986. [262] Eriksson, M. A.; Härd, T.; Nilsson, L. Biophys. J. 1995, 69, 329–339. [263] Teilum, K.; Kragelund, B. B.; Poulsen, F. M. J. Mol. Biol. 2002, 324, 349–357. [264] Garcia, P.; Serrano, L.; Durand, D.; Rico, M.; Bruix, M. Protein Sci. 2001, 10, 1100–1112. [265] Bhavesh, N. S.; Juneja, J.; Udgaonkar, J. B.; Hosur, R. V. Protein Sci. 2004, 13, 3085–3091. [266] Gray, T. M.; Arnoys, E. J.; Blankespoor, S.; Born, T.; Jagar, R.; Everman, R.; Plowman, D.; Stair, A.; Zhang, D. Protein Sci. 1996, 5, 742–751. [267] MacArthur, M. W.; Thornton, J. M. J. Mol. Biol. 1991, 218, 397–412. 261

[268] Suh, J.-Y.; Lee, Y.-T.; Park, C.-B.; Lee, K.-H.; Kim, S.-C.; Choi, B.-S. Eur. J. Biochem. 1999, 266, 665–674. [269] Alexandrescu, A. T.; Shortle, D. J. Mol. Biol. 1994, 242, 527–546. [270] Lavalette, D.; Tetreau, C.; Tourbez, M.; Blouquit, Y. Biophys. J. 1999, 76, 2744– 2751. [271] Dayie, K. T.; Wagner, G.; Lefevre, J.-F. Annu. Rev. Phys. Chem. 1996, 47, 243–282. [272] Buck, M.; Schwalbe, H.; Dobson, C. M. J. Mol. Biol. 1996, 257, 669–683. [273] Bai, Y.; Chung, J.; Dyson, H. J.; Wright, P. E. Protein Sci. 2001, 10, 1056–1066. [274] Garber, M. E.; Wei, P.; KewalRamani, V. N.; Mayall, T. P.; Herrmann, C. H.; Rice, A. P.; Littman, D. R.; Jones, K. A. Genes Dev. 1998, 12, 3512–3527. [275] Kelly, G. P.; Muskett, F. W.; Whitford, D. Eur. J. Biochem. 1997, 245, 349–354. [276] Cao, W.; Bracken, C.; Kallenbach, N. R.; Lu, M. Protein Sci. 2004, 13, 177–189. [277] Daughdrill, G. W.; Vise, P. D.; Zhou, H.; Yang, X.; Yu, W.-F.; Tasayco, M. L.; Lowry, D. F. J. Biomol. Struct. Dyn. 2004, 21, 663–670. [278] Torchia, D. A.; Lyerla, J. R.; Quattrone, A. J. Biochemistry 1975, 14, 887-900. [279] Goodman, J. L.; Pagel, M. D.; Stone, M. J. J. Mol. Biol. 2000, 295, 963–978. [280] Clore, G. M.; Driscoll, P. C.; Wingfield, P. T.; Gronenborn, A. M. Biochemistry 1990, 29, 7387–7401. [281] Korzhnev, D. M.; Orekhov, V. Y.; Arseniev, A. S. J. Magn. Reson. 1997, 127, 184–191. [282] Chen, J.; Brooks, C. L.; Wright, P. E. J. Biomol. NMR 2004, 29, 243–257. 262

[283] Henry, G. D.; Sykes, B. D. J. Magn. Reson. Ser. B 1993, 102, 193–200. [284] Koide, S.; Jahnke, W.; Wright, P. E. J. Biomol. NMR 1995, 6, 306–312. [285] Palmer, A. G.; Massi, F. Chem. Rev. 2006, 106, 1700–1719. [286] Dempsey, C. E. Prog. Nucl. Magn. Reson. Spectrosc. 2001, 39, 135–170. [287] Matthew, J.; Richards, F. J. Biol. Chem. 1983, 258, 3039–3044. [288] Wei, P.; Garber, M. E.; Fang, S. M.; Fischer, W. H.; Jones, K. A. Cell 1998, 92, 451–62.

263

Appendix A Resonance Assignments for His-tagged Tat1−72 Table A.1: Resonance assignments of Histidine-tagged Tat1−72 determined at pH 4.1 and 293 K Position

Residue Type

HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)

1

MET (M)

2

GLY (G)

3

SER (S)

8.706

115.583

58.720

64.670

174.542

4.521

4

SER (S)

8.508

117.930

58.825

64.393

174.398

4.389

5

HIS (H)

8.582

120.104

55.764

29.599

174.101

4.66

43.418



178.270

Continued on next page

264

Table A.1 – continued from previous page Position

Residue Type

HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)



6

HIS (H)

8.551

119.261

55.772

29.851

174.109

4.642

7

HIS (H)

8.733

120.061

55.913

29.941

174.126

4.666

8

HIS (H)

8.803

120.545

56.009

29.730

174.109

4.667

9

HIS (H)

8.813

121.030

56.040

29.755

174.076

4.658

10

HIS (H)

8.793

121.694

56.068

30.060

174.133

4.707

11

SER (S)

8.596

118.846

59.008

64.521

174.424

12

SER (S)

8.600

118.500

58.394

63.935

174.857

4.483

13

GLY (G)

8.449

110.714

45.918

173.750

3.953

14

LEU (L)

8.142

121.815

55.668

43.111

177.196

4.356

15 a

VAL (V)

8.224

123.400

60.486

33.390

174.409

4.396

16

PRO (P)

63.699

32.340

176.972

17

ARG (R)

8.529

122.210

56.824

31.519

177.037

18

GLY (G)

8.513

110.532

45.950

19

SER (S)

8.208

115.439

58.824

20

HIS (H)

8.626

120.212

21

MET (M)

22

GLU (E)

8.551

124.631

23

PRO (P)

174.104

3.984

64.529

174.197

4.42

56.037

29.640

174.036

4.705

54.666

31.068

173.814

4.306

62.917

32.303

176.667

Continued on next page

265

Table A.1 – continued from previous page Position

Residue Type

HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)



24 a

VAL (V)

8.223

120.681

62.657

33.578

175.602

25 a

ASP (D)

8.428

126.118

52.331

42.062

175.127

26

PRO (P)

63.699

33.551

176.906

27

ARG (R)

8.550

121.984

56.618

30.072

176.346

4.324

28

LEU (L)

7.883

120.144

55.564

42.895

176.983

4.268

29

GLU (E)

7.965

120.480

54.695

29.759

174.334

4.134

30

PRO (P)

64.394

32.967

177.573

31

TRP (W)

8.399

118.809

57.559

30.792

176.842

4.132

32

LYS (K)

8.416

121.884

56.000

33.723

175.776

4.455

33

HIS (H)

8.098

119.158

53.735

29.243

172.236

4.849

34

PRO (P)

63.900

32.933

177.600

35

GLY (G)

8.638

109.933

45.959

36

SER (S)

8.314

115.643

59.210

37

GLN (Q)

8.568

122.214

38

PRO (P)

39

LYS (K)

8.359

40

THR (T)

41

ALA (A)

4.837

174.188

4.006

64.409

174.832

4.428

56.618

30.072

176.176

4.398

62.916

33.649

176.478

119.721

56.884

33.702

174.289

8.097

115.159

62.291

70.542

174.150

4.304

8.393

126.505

53.110

20.015

177.519

4.368

Continued on next page

266

Table A.1 – continued from previous page Position

Residue Type

HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)



42

CYS (C)

8.419

119.011

59.177

28.745

174.909

4.572

43

THR (T)

8.289

116.541

62.583

70.306

174.211

4.41

44

ASN (N)

8.432

120.995

53.951

39.487

174.999

45 a

CYS (C)

8.254

119.256

59.241

28.689

174.227

46

TYR (Y)

8.307

122.671

57.958

36.497

175.523

47

CYS (C)

8.119

121.208

58.887

28.768

173.745

4.388

48

LYS (K)

7.607

122.380

56.818

33.682

175.552

4.153

49

LYS (K)

8.332

123.080

57.127

33.734

176.583

50

CYS (C)

8.405

120.658

59.147

28.809

173.635

51 a

CYS (C)

8.384

121.717

59.120

28.819

174.928

52

PHE (F)

8.381

123.552

58.265

40.414

174.110

4.549

53

HIS (H)

8.426

120.817

53.951

28.311

174.036

4.685

54 a

CYS (C)

8.416

121.317

58.792

29.286

172.737

55 a

GLN (Q)

8.177

126.982

58.149

31.060

172.164

56

VAL (V)

8.357

122.442

62.991

33.699

176.058

57 a

CYS (C)

8.664

123.713

58.757

28.794

173.638

4.517

58

PHE (F)

8.576

124.503

58.649

40.433

175.247

4.598

59 a

ILE (I)

8.193

123.581

61.567

39.719

175.932

4.234

4.185

Continued on next page

267

Table A.1 – continued from previous page Position

Residue Type

HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)



60 a

THR (T)

8.331

120.290

62.526

70.468

174.160

4.31

61

LYS (K)

8.536

125.652

56.852

33.819

175.879

4.277

62 a

ALA (A)

8.332

125.550

53.115

19.884

177.733

4.31

63 a

LEU (L)

8.282

122.012

53.035

43.177

178.001

4.291

64 a

GLY (G)

8.394

109.454

46.064

174.092

3.925

65

ILE (I)

7.978

120.084

61.858

39.541

176.267

4.135

66 a

SER (S)

8.335

119.400

58.695

64.365

174.223

4.433

67

TYR (Y)

8.258

122.864

58.847

39.496

176.454

4.53

68 a

GLY (G)

8.361

110.077

46.148

174.137

3.887

69 a

ARG (R)

8.163

120.671

56.893

31.444

176.538

4.3

70

LYS (K)

8.339

122.588

56.995

33.717

176.058

71

LYS (K)

8.360

122.962

57.127

33.734

176.583

4.266

72

ARG (R)

8.516

123.710

54.846

29.890

174.224

4.27

73

ARG (R)

8.536

123.441

56.685

31.684

176.000

4.302

74

GLN (Q)

8.546

122.755

56.364

30.662

176.176

4.333

75

ARG (R)

8.497

122.547

56.364

30.155

175.836

76

ARG (R)

8.556

123.736

56.685

31.684

176.112

77

ARG (R) Continued on next page

268

Table A.1 – concluded from previous page Position

Residue Type

HN (ppm) N (ppm) Cα (ppm) Cβ (ppm) C’ (ppm)

78

PRO (P)

79

PRO (P)

80

GLN (Q)

8.580

81

GLY (G)

82



63.651

32.675

176.979

121.074

56.712

30.468

176.616

8.531

110.671

45.950

SER (S)

8.235

115.555

58.916

64.529

174.357

4.345

83

GLN (Q)

8.450

123.127

54.366

29.763

174.224

4.583

84

THR (T)

8.151

114.849

62.559

70.371

174.342

4.262

85

HIS (H)

8.549

120.642

55.834

29.771

174.099

4.708

86

GLN (Q)

8.484

122.253

56.261

31.444

175.965

4.35

87

VAL (V)

8.358

122.331

62.991

33.699

176.058

4.151

88

SER (S)

8.456

119.760

58.644

64.388

174.476

4.475

89

LEU (L)

8.444

125.073

55.836

43.067

177.434

4.389

90

SER (S)

8.308

117.186

58.833

64.325

173.486

4.426

91 a

LYS (K)

8.349

123.790

57.046

33.704

175.646

4.334

92

GLN (Q)

8.060

126.607

57.879

31.130

172.329

4.169

269

4.309

174.217

Table A.2: Additional peak assignments from 1 H/15 N-HSQC of

13

C/15 N labelled Histidine-

tagged Tat1−72 determined at pH 4.1 and 293 K Residue Type

Assignment HN (ppm) N (ppm)

GLY

G64 b

8.354

109.262

GLY

G

8.504

110.149

GLY

G68 b

8.406

110.216

GLY

G

8.524

110.288

SER

S

8.289

116.541

CYS

C57 b

7.467

117.651

CYS

C45 b

8.157

119.191

THR

T60 b

8.196

119.256

SER

S66 b

8.342

119.406

LYS

K

8.371

119.722

HIS

H5/20

8.61

120.007

CYS

C57 c

8.415

120.048

ARG

R69 b

8.493

120.181

THR

T60 c

8.34

120.291

VAL

V24 b

8.313

120.531

Continued on next page

270

Table A.2 – continued from previous page Residue Type

Assignment HN (ppm) N (ppm)

THR

T

8.61

120.644

CYS

C54 b

8.373

120.683

ARG

R

8.559

120.962

CYS

C51 b

8.306

121.554

LEU

L63 b

8.638

121.766

LEU

L63 c

8.242

121.78

LEU

L63 d

8.307

122.169

GLN

Q

8.591

122.302

LYS

K

8.384

122.592

TYR/PHE

Y/F

8.272

122.733

TYR/PHE

Y/F

8.302

123.079

GLN

Q

8.45

123.127

GLN

Q55 b

8.592

123.257

CYS

C

8.385

123.268

GLN

Q55 c

8.609

123.349

VAL

V15 b

8.234

123.399

ILE

I59 b

8.181

123.581

CYS

C57 d

8.659

123.711

Continued on next page

271

Table A.2 – concluded from previous page Residue Type

Assignment HN (ppm) N (ppm)

LYS

K

8.342

124.364

ALA

A62 b

8.311

125.386

ASP

D25 b

8.367

125.458

ALA

A62 c

8.368

125.892

ALA

A62 d

8.395

126.169

GLN

Q

8.073

126.612

LYS

K91 b

8.034

127.997

272

Appendix B Model-Free Parameter Estimates for His-tagged Tat1−72

273

1.5

S

2

1

0.5

0

-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

10

!c (ns)

8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Figure B.1: Model-free parameter estimates using Model 2 (Rf = 0.262) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at a single 14.1 T field using 63. Residues S4 was omitted as an outlier of the parameter estimates. (a) Generalized order parameters S2 ; (b) local rotational correlation times τc (ns); (c) internal correlation times τe (ps). The sequence mean values of the estimates are indicated by the solid lines.

274

800 700

!e (ps)

600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

Figure B.1: continued

275

1.5

2

1

S

0.5

0

-0.5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (a)

10

!c (ns)

8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (b)

Figure B.2: Model-free parameter estimates using Model 2 (Rf = 0.115) with errors determined from 100 Monte Carlo sets using relaxation data (R1 , R1ρ and NOE) collected at a single 18.8 T field using 60 residues. No residues were omitted as outliers of the parameter estimates. (a) Generalized order parameters S2 ; (b) local rotational correlation times τc (ns); (c) internal correlation times τe (ps). The sequence mean values of the estimates are indicated by the solid lines.

276

800 700

!e (ps)

600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Residue (c)

Figure B.2: continued

277