Not even the joy of the proposed Bioinformatics device called Computer- .... Finally, the continuous wavelet transform-based approach is engaged to iden- ...... imental biology have served as methods of analyzing evolutionary trends ..... and bio-functionalities arise amongst the fast mutating viruses and ..... It only quickens.
Signal processing-based bioinformatics methods for characterisation and identification of bio-functionalities of proteins
Nwankwo Norbert Faculty of Technology De Monfort University
A thesis submitted for the degree of PhD in Bioinformatics July, 2012
Dedication This thesis is primarily dedicated to: The Almighty God, The Faithful One, The Beginning and The End. My loved ones including my loving daughters. They are Chinazor Nwankwo and Chioma Nwankwo. My parent, Mr. and Mrs. Felix Nwankwo. Cosic I and Veljkovic V, for discovering the Digital Signal Processing-based approaches to analyzing protein residues, and techniques, Resonant Recognition Model (RRM) and Informational Spectrum Method (ISM).
i
Acknowledgements I remained grateful to Almighty God for His Divine guard, guidance and sustenance for ”if not for His love and mercies, I would have been consumed” Lamentation 3:22-23. Because of You, Lord God Almighty, this program has become possible. Not even the joy of the proposed Bioinformatics device called Computer-Aided Drug Resistance Calculator in the course of this research can compensate for the pains of leaving my two daughters- Chinazor Nwankwo and Chioma Nwankwo far away for such a long time. I count their contributions greatest. In everything, we give thanks and praises to the Almighty God and pray that His Presence, Love and Blessings continually remain with us. I am immensely indebted to my supervisor Dr. Huseyin Seker not only for standing by me all through this program but for his enormous role. I also want to appreciate my academic mentor Dr. Amos Abioye. My thanks go to Mr. Charalambos Chrysostomou, a colleague who assisted in developing the algorithm. This program is privately sponsored. The researcher is unemployed. However, I want to appreciate the refunded initial assistance from Steve Oziko of New Health Pharmacy, Abuja, Nigeria. For their wonderful understanding and financial support, I remain grateful to my parents Mr and Mrs Felix Nwankwo, all my siblings. I would want to thank my undergraduate classmates Prof. Okafor I. S. of University of Jos, Nigeria and Prof. Ibezim E. C. of University of Nigeria, Nsukka. I am grateful to Dr. Moses Njoku of NIPRD, Abuja, Nigeria. My thanks go the ministers of God whose prayers have helped see me through. They are Pastors C. Iloka, Joe and Joel (UK) and Uche (Ivory Coast). For all others who have played any form of role during this research, may the Almighty God, the Rewarder reward you all accordingly and in multiple folds.
ii
Declaration The work described in this thesis is the original work of the author except where specific reference or acknowledgment is made to the work or contribution of others. The following conference papers and journal articles have been produced from the PhD research within the thesis. All the authors have made valuable contributions to these publications and I have referred to my specific role at the beginning of each chapter where they are used. 1. Nwankwo N, Seker H. 2010. ”Assessment of the Binding Characteristics of Human Immunodeficiency Virus Type 1 Glycoprotein120 and Host Cluster of Differentiation4 Using Digital Signal Processing ”. Conf. Proc. IEEE BIBE 2010:289-290. 2. Nwankwo N, Seker H. 2010. ”A signal processing-based bioinformatics approach to assessing drug resistance: human immunodeficiency virus as a case study”. Conf. Proc. IEEE Eng Med Biol Soc. 2010:1836-1839. 3. Nwankwo N, Seker H. 2011. ”Digital Signal Processing Techniques: Calculating the Biological Functionalities of Proteins”. J. Proteomics Bioinform 4: 260-268. doi:10.4172/jpb.1000199. 4. Nwankwo N, Seker H. 2011. ”Preliminary Investigations into the Binding Interactions between Plasmodial and Host Proteins Using Computational Approaches”. J Proteomics Bioinform 4: 269-277. doi:10.4172/jpb.1000200. 5. Nwankwo N, Seker H. 2012. ”HIV Progression to AIDS: A Bioinformatic Approach to Determining the Mechanism of Action”. Current HIV Research Journal.
iii
Abstract This research entails sequences analysis using Bioinformatics techniques. The aim is to investigate the biological functionalities of proteins as well as protein-protein interactions. This is in order to understand disease processes; design, assess and compare therapeutic interventions; and develop devices that would help in assessing efficacies of the therapeutic agents. Clinical approaches to are known to be labour-intensive, slow and resource-consuming, and require rationalization. Techniques including Digital Signal Processing-based (DSP) methods such as Resonant Recognition Model (RRM), Informational Spectrum Method (ISM) and Continuous Wavelet Transform (CWT) are engaged. Two top-killer diseases, Human Immunodeficiency Virus (HIV) and Malaria are studied. In an attempt to denature the reported fairly unstable Surface Protein (HIV gp120) so as to deactivate HIV and possibly cure AIDS, it was later found that mutations in the HIV gp120 are linked to its numerous physiological characteristics. Further study on the effects of mutations on HIV gp120 helped demonstrate the mechanism by which HIV progresses to AIDS. Using these bioinformatics approaches, Tropic and Phenotypic associations of the HIV and the relationships that exist between and amongst HIV, Simian Immunodeficiency (SIV) isolates and their host species are identified. African isolates such as the Zairian WMJ1 and the Cameroonian 96CM-MP535 are also found to share common biological functionality with an American isolate SC, suggesting cross-Atlantic transmission. Resistance arising from the exposure of five anti-HIV/AIDS drugs to their target proteins are further assessed. This led to the proposed bioinformatics tool called Computer-Aided Drug Resistance Calculator (CADRC). In addition, binding interactions that exist between Plasmodial and host proteins are predicted. The results help strengthen the fact that the pro-
iv
0. Abstract
teins’ biological functionalities and interactions are sequence-content-dependent, and can be predicted. Finally, the continuous wavelet transform-based approach is engaged to identify the connecting peptides that separate the helices of the HIV gp41 and its crystallographic product, the 1DF5. This strengthens the reliability of the technique. In conclusion, bioinformatics approaches are found to be rational and appropriate for assessing biological functionalities and interactions, and inventing therapeutic interventions such as remedies for recalcitrant HIV especially when applied to their novel target proteins.
v
Contents Dedication
i
Acknowledgements
ii
Declaration
iii
Abstract
iv
Contents
vi
List of Figures
xi
List of Tables
xxi
1 Introduction 1.1 The Prologue: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Main Contributions of the PhD Thesis: . . . . . . . . . . . . 1.3 PhD Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . .
4 4 13 20
2 Literature Review 2.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Conventional Clinical Investigations . . . . . . . . . . . . . 2.1.2 Analysis of Protein Functionalities Using Sequence Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Basis of Bioinformatics Approaches in Investigating the Biofunctionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22 22 23
vi
25 26
CONTENTS
2.2.1
2.3
Bioinformatics: Definition and the Rationale for the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Protein/Peptidic Therapeutics . . . . . . . . . . . . . . . . 2.2.3 Amino Acid Scale (AAS) . . . . . . . . . . . . . . . . . . . 2.2.3.1 Electron-Ion Interaction Pseudo-Potential (EIIP) 2.2.3.2 Hydrophobicity Index . . . . . . . . . . . . . . . 2.2.4 Protein Sequencing . . . . . . . . . . . . . . . . . . . . . . Digital Signal Processing (DSP) . . . . . . . . . . . . . . . . . . . 2.3.1 Discrete Fourier Transform (DFT) . . . . . . . . . . . . . 2.3.2 Resonant Recognition Method (RMM) . . . . . . . . . . . 2.3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . 2.3.3 Informational Spectrum Method (ISM) . . . . . . . . . . . 2.3.4 Wavelet Transform (WT) Method . . . . . . . . . . . . . .
3 Methodology 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Resonant Recognition Model (RRM) . . . . . . . . . . . . 3.2.1.1 Explanation . . . . . . . . . . . . . . . . . . . . . 3.2.2 Informational Spectrum Method (ISM) . . . . . . . . . . 3.2.3 Continuous Wavelet Transform (CWT) Method . . . . . . 3.3 Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Comparison of the Potencies of starter materials for vaccine design: Plasmodial Peptides P18 and P32 . . . . . . . . . 3.3.1.1 The Experiment . . . . . . . . . . . . . . . . . . 3.3.1.2 Results and Discussions . . . . . . . . . . . . . . 3.3.2 Comparison of the efficacies of drugs: HIV Fusion Inhibitors (Enfurvitude and Sifurvitude . . . . . . . . . . . . . . . . 3.3.2.1 The Experiment . . . . . . . . . . . . . . . . . . 3.3.2.2 Results and Discussions . . . . . . . . . . . . . . 3.3.3 Effect of Mutations on the pharmacological activities of Drugs 3.3.3.1 The Experiment . . . . . . . . . . . . . . . . . . 3.3.3.2 Results and Discussions . . . . . . . . . . . . . .
vii
26 27 29 29 30 30 31 32 33 33 35 35 39 39 41 41 41 52 53 58 59 59 65 68 68 69 72 72 72
CONTENTS
4 Resonant Recognition Model-based Approach to Mechanism of HIV Transformation into AIDS 4.1 Summary: . . . . . . . . . . . . . . . . . . . . . . 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . 4.3 Materials and Methods: . . . . . . . . . . . . . . 4.3.1 Materials . . . . . . . . . . . . . . . . . . 4.4 Resonant Recognition Model (RRM) . . . . . . . 4.5 Results . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Consensus Frequency (CF): . . . . . . . . 4.5.2 Cross-spectral Analysis . . . . . . . . . . . 4.5.3 Spectral Features: . . . . . . . . . . . . . . 4.6 Discussions . . . . . . . . . . . . . . . . . . . . . 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . .
Identifying the . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
76 76 77 81 81 82 84 84 86 90 95 97
5 Resonant Recognition-Based Characterization and Identification of HIV Tropic and Phenotypic Associations 99 5.1 Summary: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.3 Materials and Method . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3.2 Resonant Recognition Model (RRM) . . . . . . . . . . . . 103 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.5.1 Cross-Spectral Analysis: . . . . . . . . . . . . . . . . . . . 114 5.5.2 HIV-1 Viruses . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.2.1 MFA Isolate (HIV-1 T-tropic Virus): . . . . . . . 114 5.5.2.2 HIV-1 T-tropic Viruses: . . . . . . . . . . . . . . 115 5.5.2.3 HIV-1 M-tropic Viruses: . . . . . . . . . . . . . . 115 5.5.2.4 Dual HIV-1 Isolates: . . . . . . . . . . . . . . . . 116 5.5.3 HIV-2 Isolates: . . . . . . . . . . . . . . . . . . . . . . . . 116 5.5.4 SIV Isolates: . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.5.5 HIV Phenotypic Associations: . . . . . . . . . . . . . . . . 117 5.5.6 Prediction of HIV Tropic and Phenotypic Associations: . . 117
viii
CONTENTS
5.5.7
5.6
Maximum Amplitude-based Categorization: 5.5.7.1 HIV and SIV Isolates: . . . . . . . 5.5.7.2 Host Organisms: . . . . . . . . . . Conclusions: . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
118 118 120 121
6 Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study 123 6.1 Summary: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.3 Materials and Methods: . . . . . . . . . . . . . . . . . . . . . . . 126 6.3.1 Materials: . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.3.1.1 Fusion/Entry Inhibitors (example: Enfuvirtide) . 126 6.3.1.2 Nucleotide Reverse Transcriptase Inhibitors (example: Lamivudine) . . . . . . . . . . . . . . . . 127 6.3.1.3 Protease Inhibitors (example: Darunavir) . . . . 128 6.3.1.4 Maturation Inhibitors (example: Bevirimat (BVM))129 6.3.1.5 Integrase Enzyme Inhibitors (example: Raltegravir ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.3.2 Informational Spectrum Method (ISM . . . . . . . . . . . 131 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.4.1 Fusion Inhibitors (example: Enfuvirtide) . . . . . . . . . . 133 6.4.2 Protease Inhibitors (example: Darunavir) . . . . . . . . . . 133 6.4.3 Nucleotide Reverse Transcriptase Inhibitors (example: Lamivudine) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.4.4 Maturation Inhibitors (example: Bevirimat (BVM)) . . . . 135 6.4.5 Integrase Enzyme Inhibitors (example: Raltegravir) . . . . 135 6.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7 Resonant Recognition Model-based Approach to Investigating Protein Binding Interactions 147 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
ix
CONTENTS
7.3 7.4 7.5 7.6
Materials and Methods: 7.3.1 Materials: . . . . Results . . . . . . . . . . Discussions . . . . . . . Conclusions . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
151 151 156 164 166
8 Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study 169 8.1 Summary: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.3 Materials and Method . . . . . . . . . . . . . . . . . . . . . . . . 173 8.3.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.3.2 Continuous Wavelet Transform (CWT) . . . . . . . . . . . 174 8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9 Conclusions 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 HIV Progression to AIDS . . . . . . . . . . . . . . . . . . 9.1.2 Characterization and Identification of the HIV Tropic and Phenotypic Associations . . . . . . . . . . . . . . . . . . . 9.1.3 Bioinformatics Approach to Assessing Drug Resistance . . 9.1.4 Prediction of Binding Interactions In Plasmodial and Host Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.5 Continuous Wavelet Transform-Based Study of the Connecting Peptides of HIV gp41 and 1DF5 . . . . . . . . . . 9.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
182 182 183
References
195
Appendix A
231
x
186 188 190 191 193
List of Figures 2.1
3.1
3.2
3.3
3.4
3.5 3.6
Scalogram of J6 HCV core protein showing the maximum wavelet coefficient (red) representing the position of the protein hydrophobic regions like helices, and the minimum wavelet coefficient (blue) signifying the hydrophilic domains like the connecting peptides. Hydrophobic Amino Acids Scale is engaged in the analysis. . . . . Illustration of the Resonant Recognition Model (RRM), Informational Spectrum Method (ISM) and Wavelet Transform (WT) procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symmetric (mirror) image of the Spectral characteristic of Pep2 as an illustration of a DFT property. Half of the image is therefore engaged in the studies. . . . . . . . . . . . . . . . . . . . . . . . . Spectral Characteristics of peptides (A) Pep1 and (B) Pep2 showing y-axis as the normalized amplitude values (biological information) and x-axis as the frequency or positions of the of biological activities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-Spectral features (normalized) of peptides Pep1 and Pep2, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. . Alpha-helix structure of peptide 1ALI(derived from [1]). . . . . . Amino acids sequence and KYTJ820101-based signal of the 1ALI showing the helix structure as the shaded area. . . . . . . . . . . .
xi
37
42
47
49
51 55 56
LIST OF FIGURES
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
KYTJ820101-based scalogram of the 1ALI obtained through CWTbased analysis, showing the predicted regions in the form of maximum wavelet coefficient (red) and minimum wavelet coefficient (blue). Note: An overlap of the red and blue, a characteristic of amphiphilic nature of 1AL1 resulted in green colouration between position 6 and 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . (A) Plasmodial Peptide P18: sequence and EIIP-based encoded values, and, (B) EIIP-based signal of P18, which is numerical sequence that is processed using DFT. . . . . . . . . . . . . . . . . (A) Plasmodial Peptide P32: sequence, and EIIP-based encoded values, and (B) EIIP-based signal of P32, which is numerical sequence that is processed using DFT. . . . . . . . . . . . . . . . . Results of the Spectral characteristics of (A) Peptide, P18 and (B) Peptide, P32 showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. The Figure demonstrates amplitude of 1.00 at position 18 Results of the Cross-spectral characteristic of Peptides P18 and P32 showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the xaxis. The Figures demonstrate amplitude of 1.00 at position 18 . Results of the CHAM830107-based Informational Spectra (IS) of (A) P18 and (B) P32, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . Results of the CHAM830107-based Common Informational Spectrum (CIS) of P18 and P32, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. . . . . . . . . . . . . . . . . . . . . Predicted efficacy of P18 (blue) and P32 (orange) as a starter material for Malaria vaccine, showing the comparison between the biological input by each amino acid scale in the designing of the Malaria vaccine. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xii
57
60
61
63
64
64
65
67
LIST OF FIGURES
3.15 Predicted Potency of anti-HIV/AIDS drugs: Enfuvirtide (red) and Sifuvirtide (pink), demonstrating the comparison in the pharmacological activities of the two anti-retroviral agents. . . . . . . . . 3.16 The results of the Spectral characteristics of (A) nDARC and (B) nDARC mutant Y41F using EIIP, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. . . . . . . . . . . . . . . . . . . 3.17 Cross-spectral features of nDARC and mutant nDARC Y41F, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. . . . 4.1
4.2
4.3 4.4
4.5
Schematic representation of the three main stages of the HIV progression to AIDS showing the diminished M-tropic viral population and predominance of T-tropic viruses. . . . . . . . . . . . . . . . . Amino acids sequences of the V3 domain of HIV isolates showing mutations in elongated alphabets, and SF162 Dual-tropism. The symbols * and + represent same residue and deletion, respectively. The consensus sequence is derived from [2]. . . . . . . . . . . . . . Flow chart of the methodology engaged in the analysis of the HIV transition to AIDS. . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-spectral feature of the host CD4, showing point of binding interaction with the HIV/SIV as position (x) = 18 or CF = 0.0373, and the degree of binding interaction as amplitude of 1.0000, suggesting 100% affinity. . . . . . . . . . . . . . . . . . . . . . . . . . Cross-spectral feature of the HIV-1 T-tropic viruses, showing point of binding interaction with the HIV/SIV as position (x) = 18 or CF = 0.0354, and the degree of binding interaction as amplitude (y) = 1.0000). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
70
73
73
79
80 85
87
88
LIST OF FIGURES
4.6
Cross-spectral feature of the HIV-1 M-tropic viruses, showing weak interaction with the CD4 as the the degree of binding interaction, amplitude (y) of 0.001916 at the point of interaction with CD4, position (x) = 18 or CF = 0.0373. A maximum amplitude (y) = 1.00) at position (x) = 51 or CF = 0.1045 discloses a maximum interaction with yet, an unidentified protein. . . . . . . . . . . . . 4.7 Cross-spectral feature of the HIV-2 Isolates, showing weak interaction with the CD4 as the the degree of binding interaction, amplitude (y) of 0.008874 at the point of interaction with CD4, position (x) = 18, CF = 0.0373). A maximum amplitude (y) = 1.00) at position (x) = 182 or CF = 0.3714 reveals a maximum interaction with yet, an unidentified protein. . . . . . . . . . . . . . . . . . . 4.8 Cross-spectral feature of the SIV Isolates, showing weak interaction with the CD4 as the the degree of binding interaction, amplitude (y) of 0.1308 at the point of interaction with CD4, position (x) = 18 or CF = 0.0373). . A maximum amplitude (y) = 1.00) at position (x) = 120, CF = 0.2362), suggests a maximum interaction with yet, an unidentified protein. . . . . . . . . . . . . . . . . . . 4.9 The Spectral Characteristics of one isolate from each group, showing high amplitude of (A) 0.9655 (96.55% affinity) for HIV-1 Ttropic HXB3, and (B) low amplitude of 0.2045 and (20.45 % affinity) for HIV-1 M-tropic, YBF30 at the point of interaction with the CD4, (position (x) = 18, CF = 0.0354). This appears to demonstrate high interaction between the HIV-1 T-tropic HXB3 and CD4, weak interaction between the HIV-1 M-tropic, YBF30 and the CD4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 The Spectral characteristics of one isolate from each group, showing both low amplitude of (A) 0.4045 (40.45% affinity) for HIV-2 Ghana-1 and (B) 0.3782 (37.82% affinity) for SIV GAB1 at the point of interaction with the CD4, (position (x) = 18, CF = 0.0354), which seems to illustrate weak interaction with the CD4.
xiv
89
90
91
92
93
LIST OF FIGURES
4.11 The Spectral characteristics of the HIV-1 SF162, showing amplitude of (A) 0.375 (37.50% affinity) for HIV-1 SF162 M-tropic and (B) 0.4108 (41.08% affinity) for HIV-1 SF162 T-tropic at the point of interaction with the CD4, (position (x) = 18, CF = 0.0354). This appears to suuport the fact that mutation in the M-tropics that bring about transformation into T-troipcs results in increase in affinity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1
5.2
5.3
5.4
5.5
5.6
5.7
The CS analyses of the (A) HIV gp120, (B) CD4, and (C) Multiple CS of gp120 of HIV and SIV showing common point of binding interaction, (CF) at position 18, respectively. . . . . . . . . . . . . The SC of the three groups, (A) HIV-1 MFA with the highest amplitude of 1.0; (B) HIV-2, ST/24.1C#2 with low amplitude of 0.3413; and (C) SIV, MB66 with also low amplitude of 0.3194, respectively at the point of common binding interaction, CF (position 18 or CF=0.0354). . . . . . . . . . . . . . . . . . . . . . . . The SC of the (A) HIV-1 OYI, (B) Z6 and (C) CDC-451 showing amplitude of 1.00, 0.7347 and 0.6854, respectively at the same point of maximum amplitude, 155 (f=0.305). . . . . . . . . . . . . The SC of the (A) HIV-1 SC, (B) 96CM-MP535 and (C) WMJ1 showing amplitude of 0.6842, 0.5499 and 0.4890, respectively at the same point of maximum amplitude, 152 (f=0.299). . . . . . . The SC of the (A) HIV-1 VI850 and (B) SIV MB66 showing amplitude of 0.5793 and 0.6926, respectively at the same point of maximum amplitude, 158 (f=0.311). . . . . . . . . . . . . . . . . The SC of the (A) SIV TAN1 and (B) SIV GAB1 showing amplitude of 1.0000 and 0.8986, respectively at the same point of maximum amplitude, 200 (f=0.394). . . . . . . . . . . . . . . . . The SC of the (A) HIV ELI, (B) MAL and (C) Z2/CDC-Z34) showing amplitude of 0.6326, 0.6445 and 0.7289, respectively at the same point of maximum amplitude, 150 (f=0.295). . . . . . .
xv
94
109
110
110
111
111
112
112
LIST OF FIGURES
5.8
5.9
6.1
6.2
6.3
6.4
6.5
6.6
The SC of the CD4 from (A) Human and (B) Chimpanzee showing amplitude of 0.7235 and 0.7640 at the same point of maximum amplitude, 68 (f=0.141). . . . . . . . . . . . . . . . . . . . . . . . 113 The SC of the CD4 from (A) Dancing Monkey, (B) Green Monkey and (C) Pig-tailed Monkey showing amplitude of 0.6592, 0.7699 and 0.7511, respectively at the same point of maximum amplitude, 101 (f=0.210). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Protein 1 (Darunavir): Sequences showing mutations in elongated letters. There are 5 mutations in the the amino acid sequences of the resistant strain . . . . . . . . . . . . . . . . . . . . . . . . . . Protein 2 (Bevirimat): Sequences showing mutations in red and elongated letters.There are 3 mutations in the the amino acid sequences of the resistant strain . . . . . . . . . . . . . . . . . . . . Protein 3 (Raltegravir): Sequences showing mutations in red and elongated letters. There are 8 mutations in the the amino acid sequences of the resistant strain . . . . . . . . . . . . . . . . . . . CIS of the protein residues of both susceptible and resistant strains exposed to Fusion inhibitors (Enfuvirtide) showing amplitude of 1.0 at position 3, indicating 100% Pharmacological affinity for both sequences at the point of interaction. . . . . . . . . . . . . . . . . IS of the protein residues of the susceptible strain exposed to Fusion inhibitors (Enfuvirtide) showing amplitude of 1.0 at position 3, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction. . . . . . . . . IS of the protein residues of the resistant strain exposed to Fusion inhibitors (Enfuvirtide) showing amplitude of 0.9777 at position 3, indicating 97.77% Pharmacological affinity and resistance of 2.23% for amino acid sequences of the resistant strain at the point of interaction. at position 3. . . . . . . . . . . . . . . . . . . . . . .
xvi
129
131
132
134
135
136
LIST OF FIGURES
6.7
6.8
6.9
6.10
6.11
6.12
6.13
CIS of the protein residues of both susceptible and resistant strains exposed to Protease inhibitor (Darunavir) showing amplitude of 1.0 at position 39, indicating 100% Pharmacological affinity for both sequences at the point of interaction. . . . . . . . . . . . . . IS of the protein residues of the susceptible strain exposed to Protease inhibitor (Darunavir) showing amplitude of 1.0 at position 39, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction. . . . . IS of the protein residues of the resistant strain exposed to Protease inhibitor (Darunavir) showing amplitude of 0.8995 at position 39, indicating 89.95% Pharmacological affinity and resistance of 10.05% for amino acid sequences of the resistant strain at the point of interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . CIS of NRT inhibitor (CIS of the protein residues of both susceptible and resistant strains exposed to NRT inhibitor (Lamivudine) showing amplitude of 1.0 at position 40, indicating 100% Pharmacological affinity for both sequences at the point of interaction. . . IS of the protein residues of the susceptible strain exposed to NRT Inhibitor (Lamivudine) showing amplitude of 1.0 at position 40, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction. . . . . . . . . IS of the protein residues of the resistant strain exposed to NRT Inhibitor (Lamivudine) showing amplitude of 0.9467 at position 40, indicating 94.67% Pharmacological affinity and resistance of 5.33% for amino acid sequences of the resistant strain at the point of interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CS of NRT inhibitor (CIS of the protein residues of both susceptible and resistant strains exposed to Maturation Inhibitors (Bevirimat) showing amplitude of 1.0 at position 165, indicating 100% Pharmacological affinity for both sequences at the point of interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xvii
137
138
139
140
141
142
143
LIST OF FIGURES
6.14 SC of the protein residues of the susceptible strain exposed to Maturation Inhibitors (Bevirimat) showing amplitude of 1.0 at position 165, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction. . . 143 6.15 SC of the protein residues of the resistant strain exposed to Maturation Inhibitors (Bevirimat) showing amplitude of 0.9352 at position 165, indicating 93.52% Pharmacological affinity and resistance of 6.48% for amino acid sequences of the resistant strain at the point of interaction. . . . . . . . . . . . . . . . . . . . . . . . 144 6.16 CIS of NRT inhibitor (CIS of the protein residues of both susceptible and resistant strains exposed to Integrase Enzyme Inhibitors (Raltegravir) showing amplitude of 1.0 at position 67, indicating 100% Pharmacological affinity for both sequences at the point of interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.17 IS of the protein residues of the susceptible strain exposed to Integrase Enzyme Inhibitors (Raltegravir) showing amplitude of 1.0 at position 67, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction. of susceptible Integrase Enzyme Inhibitors (Raltegravir). . . . . . 145 6.18 IS of the protein residues of the resistant strain exposed to Enzyme Inhibitors (Raltegravir) showing amplitude of 0.9684 at position 67, indicating 96.84% Pharmacological affinity and resistance of 3.16% for amino acid sequences of the resistant strain at the point of interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.19 Percentage decrease in Pharmacological activities (Resistance) shown by anti-HIV/AIDS agents (1) Enfuvirtide, (2) Lamivudine, (3) Darunavir, (4) Bevirimat and (5) Raltegravir. . . . . . . . . . . . 146 7.1
The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) AMA-1 at position 165 or CF=0.276 and (B) Interleukin at position 64 or CF=0.258. . . . . . . . . . . . . . . . 161
xviii
LIST OF FIGURES
7.2
7.3
7.4
7.5
7.6
7.7
8.1
8.2
8.3
The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) the AMA-1 at position 165 or CF=276 and (B) PfRON5 at position 333 or CF=0.288. . . . . . . . . . . . . . . . The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) CSP Region 11 at position 19 or CF=0.359 and (B) Importin α-3 at position 173 or CF=0.332. . . . . . . . . . . . . . The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) CSP Region 11 (conserved region) at position 7 or CF=0.368 and CD36 (B) at position or CF=0.346. . . . . . . . . The results of the CS analyses showing Amplitude of 1.00 each at the CF by by (A) Ring-infested Erythrocyte Surface Antigen (RESA) at position or CF=0.25 and (B) the Spectrin-B (B) at position 985 or CF=0.382. . . . . . . . . . . . . . . . . . . . . . . The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) HIV gp160 at position 155 or CF=0.185, (B) the HIV gp120 at position 18 or CF=0.0354 and (C) the gp41 at position 64 or CF=0.186. . . . . . . . . . . . . . . . . . . . . . . . The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) HIV gp120 at position 18 or CF=0.0354 and (B) host CD4 at position 18 or CF=0.0373. . . . . . . . . . . . . . . .
161
162
162
163
163
164
Amino acids sequences, showing two residues (37G and 38R) of the connecting peptide in (A) the 1DF5, and 45 residues (37A-81T) in (B) HIV gp41 core, all indicated at the white background at the middle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Structure of (A) the 1DF5 showing short connecting peptide in between the two helices and (B) HIV gp41 core, displaying a long connecting peptide. . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Eisenberg-based scalogram of (A) the 1DF5 (Scale 2) showing one minimum wavelet coefficient (blue) between position 38 and 42 and (B) Wolfenden-based result of HIV gp41 core (Scale 1), showing three minimum wavelet coefficients (blue) that span through position 35 to 85. . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
xix
LIST OF FIGURES
8.4
8.5
8.6
9.1 9.2 9.3
Eisenberg-based scalogram of (A) the 1DF5 (Scale 2), demonstrating one minimum wavelet coefficient and (B) HIV gp41 core (Scale 1), showing three minimum wavelet coefficient. . . . . . . . . . . . 177 Fauchere-based scalogram of (A) the 1DF5 (Scale 2)demonstrating one minimum wavelet coefficient and (B) HIV gp41 core (Scale 1), showing three minimum wavelet coefficients. . . . . . . . . . . . . 178 Kyte and Doolittle-based scalogram of (A) the 1DF5 (Scale 2) and (B) HIV gp41 core (Scale 2) . . . . . . . . . . . . . . . . . . . . . 178 Percentage affinities by the four classes of HIV and SIV proteins to CD4+ T cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Affinities of exemplary isolate from each class to the CD4+ T cells 185 Pharmacological activities of the susceptible (blue) and resistant (red) strains of the HIV target proteins exposed to (1) Enfuvirtide, (2) Lamivudine, (3) Darunavir, (4) Bevirimat and (5) Raltegravir. 189
xx
List of Tables 1.1 1.2
Non-homologous Peptides A and D . . . . . . . . . . . . . . . . . Percentage similarity of the sequences enagaged. . . . . . . . . . .
9 10
2.1
Amino acids, three letter and one letter alphabetic code representations; and the amino acids scales, Electron-ion Interaction Pseudo-potential (EIIP) and Hydrophobicity (WOLR790101). . .
28
3.1
3.2 3.3
3.4
3.5 3.6
3.7
Peptides Pep1 and Pep2, extracts from the HIV Gag-pol and gp160, which are known to share a common biological characteristic of binding to a protein, HLA-Cw*0102 are used to demonstrate the RRM procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Peptides (P18 and P32) used in comparing the potencies of starter materials for Malaris vaccine design. . . . . . . . . . . . . . . . . 44 The results of the RRM analysis of Pep1 and Pep2 showing the biological information contained in the protein as Spectral Characteristics (SC) and Cross-Spectral (CS) features. . . . . . . . . . 47 The normalized results of the RRM analysis of Pep1 and Pep2. The highest value was brought to 100% in order to ease interpretation of the results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Numerical values of amino acid scales: CORJ870101 and KYTJ820101. 62 Percentage Predicted biological functionalities of P18 and P32, showing input by each amino acid scale and the positions of interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Prototypic peptide of Sifuvirtide. . . . . . . . . . . . . . . . . . . 68
xxi
LIST OF TABLES
3.8
Percentage Predicted biological functionalities of Enfuvirtide, Sifuvirtide and NHR, showing the contribution of each amino acid scale in the pharmacological activities of the drugs as well as their positions of interaction. . . . . . . . . . . . . . . . . . . . . . . . . 69 3.9 The nDARC peptide and mutant nDARC Y41F engaged in the study of effect of mutations on the pharmacological activities of drugs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.10 Percentage Predicted biological functionalities of nDARC and nDARC Y41F, showing contribution by each amino acid scale, and their positions of interaction. . . . . . . . . . . . . . . . . . . . . . . . . 74 4.1 4.2 4.3 4.4 4.5 4.6
4.7 5.1
5.2
HIV-1 T-Tropic isolates and their protein identities. . . . . . . . . HIV-1 M-tropic isolates and their protein identities. . . . . . . . . HIV-2 and SIV isolates and their protein identities. . . . . . . . . Binding capabilities of the group of CD4, HIV and SIV Isolates . Binding capacities of the host CD4 to the HIV. . . . . . . . . . . Affinities of exemplary isolates from each class to the CD4, showing the level of interaction between one isloate from each class and the CD4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HIV-1 Dual-tropic isolate and its protein identity. . . . . . . . . .
82 83 83 86 91
92 94
Results of the spectral characteristics of the CD4 from the host organisms demonstrating the degree of the affinity between the HIV and the host species. MP stands for Maximum Peak and NF, Not Found. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Results of the spectral characteristics of HIV-1 T-tropic viruses, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. Five isolates have maximum affinity for the CD4. MP : Maximum Peak. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
xxii
LIST OF TABLES
5.3
5.4
5.5
5.6
5.7 5.8
6.1 6.2
7.1
Results of the spectral characteristics of HIV-1 M-tropic viruses, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. One isolate, LW123 has maximum affinity for the CD4. MP : Maximum Peak. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Results of the spectral characteristics of HIV-1 Dual-tropic viruses, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. MP : Maximum Peak. . . . . . . . . . . . . . . . . . . . . . . . . 107 Results of the spectral characteristics of HIV-1 with unknown tropism, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. MP : Maximum Peak, and NF: Not Found . . . . . . . . . . . . . 108 Results of the spectral characteristics of HIV Type 2 isolates, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. MP : Maximum Peak. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Results of the spectral characteristics of SIV isolates. MP : Maximum Peak. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 2. Predicted HIV-1 Tropic and Phenotypic associations. Two isolates with least affinity for the CD4 are predicted to belong to HIV-1 M-tropic and NSI, while two with the highest attraction to the CD4 are envisaged to be part of T-tropic and SI groups . . . . 118 Amino acid scales engaged: (1) LIFS790103 [3], (2) WILM950103 [4], (3) BEGF750102 [3], (4) ROBB760104 [5] and (5) EIIP [6] . . 127 Resistance offered by the HIV target proteins of the anti-HIV drugs obtained as the differences the amplitudes of the Susceptible and Resistant strains at the points of interaction between the drugs and the target proteins. . . . . . . . . . . . . . . . . . . . . . . . 133 Summary of the computational results from the group of clinically interacting Plasmodial and host proteins. . . . . . . . . . . . . . . 157
xxiii
LIST OF TABLES
7.2 7.3 7.4 7.5
7.6
8.1 8.2
9.1 9.2 9.3 9.4 9.5
CF of the clinically interacting Plasmodial and host proteins from 0.000 to 0.200. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CF of the clinically interacting Plasmodial and host proteins from 0.201 to 0.300. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CF of the clinically interacting Plasmodial and host proteins from 0.301 to 0.500. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of the Cross-Spectral analysis of RESA showing the Consensus Frequencies of each domain. N is the length of the longest sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of the Cross-Spectral analysis of Spectrin α and β showing the Consensus Frequencies of each domain. PP stands for peak position or position of the maximum amplitude; N is the length of the longest sequence; and CF, refers to the Consensus Frequency.
158 159 160
164
168
The four Hydrophobicity-based amino acid scales engaged: Eisenberg [7], Kyte-Doolittle [8], Fauchere [9] and Wolfenden [10] . . . 174 Result of the CWT analysis of the 1DF5 and HIV gp41 using amino acid scales: Eisenberg [7]and Wolfenden [10]. MV: Minimum value 176 Consensus Frequencies of the four classes of HIV and SIV proteins Maximum amplitude-based prediction of relationships in HIV and SIV isolates across Atlantic. . . . . . . . . . . . . . . . . . . . . . Maximum amplitude-based prediction of relationships in HIV and SIV isolates, and the host species. . . . . . . . . . . . . . . . . . . Correlation of the clinical and computational Relationships. . . . Result of the CWT analysis of the 1DF5 and HIV gp41 using amino acid scales: Eisenberg and Wolfenden . MV: Minimum Value . . .
xxiv
186 188 188 191 192
LIST OF TABLES
Nomenclature Acronym AAS ABRA AIDS AMA ARGP820101 BURA740101 CD Celtos CHAM830107 CHAM830108 CSP CIS CR CADRC CF CWT CS DSP DWT DARC DBP DBL EIIP EBA ENGD860101 FASG890101 FAUJ880111 FAUJ880112
Meaning Amino Acid Scale Acidic Basic Repeat Antigen Acquired Immune Deficiency Syndrome Apical Merozoite antigens Hydrophobicity-based Amino Acid Scale Alpha Helix-related Amino Acid Scale Cluster of Differentiation Cell Trasversal protein for Ookinetes, Sporozoites Charge transfer-based-based Amino Acid Scale Charge transfer-based Amino Acid Scale Circumsporozoite Protein Common Informational Spectrum Complement Receptor Computer-Aided Drug Resistance Calculator Consensus Frequency Continuous Wavelet Transform Cross Spectral Digital Signal Processing Discrete Wavelet Transform Duffy Antigen Receptor Chemokines Duffy Binding Proteins Duffy Binding-Like Electron-ion Interaction Pseudo-potential Erythrocyte Binding Antigen Hydrophobicity-based Amino Acid Scale Hydrophobicity-based Amino Acid Scale Positive-based Amino Acid Scale Negative Charge-based Amino Acid Scale
1
LIST OF TABLES
gp NHR HIV IS ISM JURD980101 KAHRP KLEP840101 LRP LSAs M-tropic MACPF MAOP MSA MSPs MJ MP MSA MV NLS PfEMP1 PLP PEXEL PfRH PECAM-1 PONP800104 PRAM900102
Glycoprotein N-Terminal Heptad Repeat Human Immunodeficiency Virus Informational Spectrum Informational Spectrum Method Hydrophobicity-based Amino Acid Scale Knob-Associated Histidine-Rich Protein Net Charge-based Amino Acid Scale Lipoprotein Receptor-Related Protein Liver Stage Antigens Macrophage-tropic Membrane Attack Complex/Perforin Membrane Attack Ookinete Protein Merozoite Surface Antigen Merozoite surface Proteins Moving Junction Maximum Peak Multiple Structural Alignment Minimum Value Nuclear Localization Signal Plasmodium falciparum erythrocyte membrane protein 1 Phospholipases Proteins Plasmodium Export Element Plasmodium falciparum Reticulocyte Binding Homogues Platet/Endothelial Adhesion Molecule-1 Alpha Helix-related Amino Acid Scale Alpha Helix-related Amino Acid Scale
2
LIST OF TABLES
RRM RON RESA SOAP STFT SIV SC SALSA SLARP SPECT SSP STARP SI T-tropic TRAP TLP VCAM-1 WOLR790101 WT
Resonant Recognition Method Rhoptries Neck Ring-infested Erythrocyte Surface Antigen Secreted Ookinete Adhesive Protein Short Time Fourier Transform Simian Immunodeficiency Virus Spectral Characteristics Sporozoite and Liver Stage Antigen Sporozoite And Liver Stage Asparagine-Rich Protein Sporozoite microneme Protein Essential for cell Traversal Sporozoite Surface Protein Sporozoite Threonine and Asparagine Rich Protein Syncytium-Inducing T-Cell Lymphocyte Tropic Thrombospondin-Related Adhesive Protein TRAP-Like Protein Vascular Cell Adhesion Molecule 1 Hydrophobicity-based Amino Acid Scale Wavelet Transform
3
Chapter 1 Introduction 1.1
The Prologue:
The battle to extinguish harmful organisms and improve on our well-being through correcting human anomalies is still ongoing. Disease conditions resulting from invasion by living organisms and viruses as well as deteriorating human organs and systems require first, understanding of their biological mechanism. This is followed by the designing and developing of therapeutic interventions such as drugs, vaccines and other medical devices. Undertaking a research on these areas remains one of the vital avenue to battle these organisms and viruses. The aim of this research therefore is to assess biological functionalities of proteins as well as protein-protein interactions in order to first understand how diseases progress and then study the physiological characteristics of the proteins involved; evaluate existing therapeutic interventions, and design bio-medical apparatuses that will help alleviate our well-being. Two diseases, namely HIV and Malaria are investigated using a computer-assisted procedures. These techniques are applied to protein residues of the viruses, micro-organisms and their host targets , which are obtained from database such as [1]. Amino acids sequences from HIV including conserved domains Variable region 3 (V3) of the HIV surface protein (HIV gp120) [2], Plasmodium (causative organism for Malaria), and host organisms are examined using parameters that determine the level of participation of each protein residues in molecular interac-
4
1. Introduction
tions. These sets of parameters are called Amino Acid Scales (AASs) [3]. They are based on the study of the characteristics of the amino acid contents of the proteins including physiochemical properties [4]. In this research, neither animal tissue nor reagent and not even the simplest clinical equipment is used. Rather, amino acid sequences of Plasmodial proteins such as Sporozoite Surface Protein (SSP2) [5] and HIV gp120 are obtained from literatures [6], and analyzed using appropriate AASs. The AASs engaged include Hydrophobicity-based Eisenberg [7], Kyte-Doolittle [8], Fauchere [9] and Wolfenden [10]. The approaches engaged in these studies, which are computer-assisted are different from what obtained during the ”Botanical Age” era. Therapies have been provided during the ”Botanical Age” by means of serendipitous drug discoveries [11; 12]. Serendipitous drug discoveries refer to the invention of therapeutic agents by chance. Procedures engaged during the ”Botanical Age” consist mainly of the use of crude extracts handed over to patients by herbalists. Then, information on the drugs and their mechanisms of actions are not available and could not be provided. Little knowledge of drugs activities acquired by the herbalists are then hoarded. Though this method later transited to the conventional clinical assessment [12], the rationality of their application in the evaluation of the bio-functionalities of protein residues has not adequately been addressed. These clinical processes are known to be cumbersome, laborious, expensive and slow [13; 14]. They therefore require approaches that will be faster and cheaper, the computational approaches. Biological functionalities refers the responses elicited as a result of bio-molecular interactions in living organisms. Proteins and peptides are the major biomolecules found in living organisms and viruses [15]. They are also known to be responsible for most biological functionalities in living organisms and viruses [15]. For the proteins or peptide-based drugs, conventional clinical techniques exist for the assessment of their bio-functionalities and interactions with their target proteins. Conventional clinical techniques involve the identification and characterization of the active ingredients, pharmacological evaluations, assessment of synthetic feasibility [16]. Other clinical assessment procedures include synthetic processes, standardization of the active constituents and clinical trials. Dosage formulation
5
1. Introduction
and subsequent mass production are the last procedures in the clinical assessment of drugs. Typical conventional clinical assessment procedures include Antimicrobial Susceptibility Test (AST) [13; 14; 17] and Analysis of Direct Action of Histamine on the Guinea Pig Ileum [18]. These assessment techniques are known to involve stringent and resource-consuming methodologies and experimental environments that require laborious definitions and detailed documentations of specifications and interpretative criteria [13; 14]. Conventional clinical assessment procedures are also saddled with short-comings. They are laborious and expensive. Resource-consuming and time-wasting are also evident. This is because clinical approaches to evaluating biological characteristics of the protein residues of the host target cells, tissues, organs and the systems involve reagents that are often difficult to obtain and expensive. Additionally, the amino acids sequences of the proteins and peptides continue to change (mutations). Mutations in the protein residues of the invading organisms provide them with ability to resist undesirable environment such as drugs. As a result, vast data of the mutated sequences are available for studies in an attempt to develop therapeutic interventions. For example, one specie of Plasmodium called falciparum has over 4,600 peptides which are assembled into over 700 proteins [19]. However, Plasmodium is believed to have 250 species with 64 recognized [20]. Clinical approaches to evaluating the bio-functionalities of such huge data on protein residues from organism and viruses have become more challenging. In addition, knowledge of diseases and their pathogenesis are needed in order to help understand the bio-functionalities of the drug target proteins. This understanding will also assist in identifying interactions that exist between drugs and their target sites. Investigating the bio-functionalities of proteins by utilizing their sequence information remain a major step towards understanding disease processes. It is a rational and feasible approach. Such investigation include the progression of HIV infection to AIDS disease. In this study, two disease conditions namely, HIV/AIDS and Malaria are considered. HIV/AIDS and Malaria have been recognized as the two top killer diseases [21] that have attracted unprecedented expenditure [22]. They have also
6
1. Introduction
demonstrated difficulty in management [23]. HIV/AIDS has remained incurable as a result of the viral high replicative ability [24] and capacity to resist most anti-retroviral agents [23]. Malaria, on the other hand has eluded cure [25] as a result of resistance acquired by the causative organism, Plasmodium [26] as well as difficulties encountered in eradicating Plasmodia [25]. Therefore, this research is then aimed at investigating the proteins and peptides of the causative organisms of HIV/AIDS and Malaria using bioinformatics approaches. This is in order to help design and develop cure. Their life-cycles are briefly described below. HIV/AIDS disease process has earlier been described [27]. It starts with the attachment of the virus to the host T-cell. The HIV Surface protein (HIV gp120) is fastened to the host CD4 while the HIV Transmembrane protein (HIV gp41) is bound to the CCR5. This is followed by a conformational rearrangement that leads to the fusion of the viral genetic content into the host T-cell. By the means of enzyme called Reverse Transcriptase, the HIV single strand, RNA duplicates into DNA which is integrated into the host genome to form a ”Proviron” by an enzyme Integrase. The proviron is then sliced into parts by the Protease Enzyme which later assembled and escape the host cells as new born viruses. The replication continues until the CD4 is completely depleted. This leads to the collapse of the immune system which manifest as AIDS [27]. The second disease condition is Malaria. The causative organism for Malaria is Plasmodium. The life-cycle of Plasmodium has been elucidated [28]. Plasmodium invades and destroys host liver, thereafter, the red blood cells. This results in the release of toxins that give rise to fever, loss of appetite, and all other symptoms of Malaria. In this case, binding interaction that exist between Plasmodial proteins and the host proteins are first studied. This is with the aim of developing algorithms that would help assess numerous Plasmodial proteins and peptides that are becoming challenging to the clinical approaches; and help narrow the search for cure. These are targeted curing and eradicating Malaria disease. Computerized Bio-medical apparatus called Computer-Aided Drug Resistance Calculator for assessing the resistance by target proteins of these viruses and other micro-organisms is developed with the intention to facilitate the procedure
7
1. Introduction
of determining which drug would best apply to the treatment of not only the two diseases conditions but others. The essence of these studies are to look for how these recalcitrant viruses and micro-organisms could be exterminated. This is to apply to other organisms. In order to carry out these studies without embarking on labour-intensive, resource-consuming, expensive, time-wasting,laboratory-based experimentations, and also face the challenges that have besieged the clinical methods to assessing biological functionalities of proteins and peptides, rational procedures have therefore become relevant and inevitable. These techniques are expected to obtain, assemble, arrange, document, explore, represent and analyze these huge data in order to correlate, reconcile and reveal their biological, medical, behavioural or health significance. Several bioinformatics methods have been employed in the assessment of biological functionalities. They include CLUSTRLW [29], Multiple Alignment programs like Fast Fourier Transform (MAFFT) [30], MUltiple Sequence Comparison by Log-Expectation (MUSCLE) [31], T-Coffee [32]. These bioinformatics techniques engage homogeneity. Another bioinformatics methods used for protein sequence analysis is based on Signal Processing concept. These procedures consider proteins as signals, which are obtained as numerical sequences translated from the discrete protein residues using numerical values that represent the degree of participation of these residues in the biological interaction being examined. Asa result, Digital Signal Processing-based techniques (DSP) [33] are employed in this study. Protein and peptide residues are in nature discrete. This is because they are amino acids sequences in linear formation [34]. DSP is employed in this study because it considers proteins as signals representing biological properties that result from the translation the protein residues into numerical sequences using a set of parameters that governs the property being studied. As signals, proteins can then be analyzed using DSP technique in order to reveal the biological functionalities embedded in the proteins and peptides. DSP including DFT-based RRM or ISM captures the frequency components (points of interaction) of the signal (biological characteristic of proteins such as binding interactions) and therefore uncovers the magnitude or strength
8
1. Introduction
embedded as amplitudes. On the other hand, Time-Series oriented DSP approaches like CWT displays a region of minimum or maximum coefficients as structural motifs such as connecting peptides or helices. Programs that engage homogeneity have helped in the analysis of proteins. Homologous proteins or peptides have however been observed demonstrate divergent biological properties. Similarly, non-homologous proteins or peptides have also been identified to share same biological characteristics [35]. Table 1.1: Non-homologous Peptides A and D Peptide Identity Peptide Peptide A
KQQYYWYAWCQPPQDQLIMD
Peptide D
DDALYDDKNWDRAPQRCYYQ
It is demonstrated in that study that two 20-mer length non-homologous Peptide A and Peptide D (Table 1.1) revealed same consensus frequencies. This signifies similar bio-functionalities that afforded them the opportunity for consideration in the attempted designing of anti-HIV vaccines. It therefore became obvious that DSP technique is better compared to homology-based procedures. For details, refer to Chapter 3 (Methodology) where two non-homologous sequences, Pep1 and Pep2 (Table 3.1) and two unequal sequences, P18 and P32 (Table 3.2) are used to demonstrate the relevance of DSP techniques to reveal biological information hidden in the proteins and peptides rather than investigating their biological functionalities based on their sequence similarities. The sequences and the signals of P18 (Figure 3.8) and P32 (Figure 3.9), though are divergent but application of DSP technique revealed similar biological functionality (binding interaction) at position 18 (Figure 3.10). The entire computing approach engaged is explained in section 3.2.1. Though the percentage similarities demonstrated by Peptide A and D (19.0 %), as well as Pep1 and Pep2 (22.2 %) as shown in Table 1.2 remain low, they displayed a common biological functionality (binding interaction). Peptide P18 and Peptide 32 have an average similarity of 56.2% and still share common point of affinity at postion 18. Though the sequences may demonstrate divergence in their alignment, they may still possess similar biological functionalities. This
9
1. Introduction
Table 1.2: Percentage similarity of the sequences enagaged. Peptide Percentage similarity Peptide A and D
19.0
Pep1 and Pep2
22.2
Peptide P18 and Peptide 32 56.2 therefore explained why DSP programs are engaged in this study. DSP technique entails conversion of the alphabetic code of the protein residues into numerical values by means of one amino acid scale. The encoded protein residues, which are signals are processed using Fourier Transform (FT) in order to obtain informational spectra that describe the biological characteristics of protein [35; 36]. This process by which biological characteristics of the proteins and peptides are presented as spectra is called Informational Spectrum Method (ISM) [36; 37]. A version of this process that uses Electron Ion Interaction Psuedopotential (EIIP) scale is called Resonant Recognition Model (RRM) [35]. Wavelet Transform (WT) is another Digital Signal Processing-based method that has been employed in identifying biological characteristics of proteins and peptides, and also assessing interactions that exist amongst them [38]. In this study, the mechanism by which HIV infection translates to AIDS disease is first determined. This is demonstrated by comparing the affinity that exists between the HIV surface protein called gp120 and its host target site, the CD4. The amino acids sequences of these proteins are translated into signals using an ASS that determines affinity that exist between proteins. This scale is called EIIP [35]. By this approach, these proteins are then represented as signals of bio-recognition and binding interaction from the HIV gp120 and host CD4. For example, proteins (P) are no longer seen as a sequence of amino acids such as: P1 − P2 − P3 − P4 − P5 − P6 − P7 ...Pn . They are rather seen as signals (S) representing the degree of affinity by each protein such as: S1 − S2 − S3 − S4 − S5 − S6 − S7 ...Sn . Because they are signals, Discrete Fourier Transform-based DSP procedure is
10
1. Introduction
applied. This results in the derivation of the position (F) and the level (L) of affinity between the HIV gp120 from various isolates and the CD4 of the host species. The HIV isolates are grouped into four namely, the HIV type 1 (HIV1) T-cell lymphocytes loving (T-Tropic), HIV-1 Macrophage loving (M-Tropic), HIV type 2 (HIV-2) and Simian Immunodeficiency (SIV). The positions (F) of interactions of the HIV-1 T-Tropic (FT −T ropic ), HIV-1 M-Tropic (FM −T ropic ), HIV-2 (FHIV −2 ) and the SIV (FSIV ) are compared with that of the CD4(FCD4 ). Additionally, the levels (L) of interactions between the CD4 (LCD4 ) and those of the various groups of the HIV and SIV isolates including the HIV-1 T-Tropic (LT −T ropic ), HIV-1 M-Tropic (LM −T ropic ), HIV-2 (LHIV −2 ) and the SIV (LSIV ) are studied in order to find out which group has highest affinity for the host CD4. It was observed that (FCD4 ) is similar to (FT −T ropic ) and both have high magnitude. This knowledge helped in determining the mechanism of HIV progression to AIDS. Because this knowledge was not enough to understand how to combat the menace of the disease, physiological characteristics the HIV surface protein (HIV gp120) and its target protein, the Cluster of Differentiation4 (CD4) are studied. First positions (F) and levels (L) of interactions between the HIV and SIV isolates are determined. In addition, the maximum positions (MF) interactions as well as their levels (ML) of the isolates are obtained. These results are studied. Based on the maximum position (MF) of interactions, the MF of an American isolates like SC (M FSC ) was found to be same as the Zairian WMJ1 (M FW M J1 ) and the Cameroonian 96CM-MP535 (M F96CM −M P 535 ), suggesting trans-Atlantic transmission. The maximum position (MF) of interactions of isolates MFA (M FM F A ), BH10 (M FBH10 ), HXB3 (M FHXB3 ) and HXB2 (M FHXB2 ) are found to be same as with CD4 (M FCD4 ), a property which helped categorized the as candidate interactors. They represent a group of HIV isolates that would provide adequate studies with the CD4. This provided opportunity to categorize the HIV isolates and identify the association that exist amongst them, an attribute that would helped in the designing of new therapeutic interventions. Several anti-HIV/AIDS agents have been employed in the struggle to eradicate HIV from our systems. This research was therefore further directed to the
11
1. Introduction
comparison of the potencies of one peptide-based agent, which is already in use (Enfurvitude) and another that is undergoing clinical (Sifurvitude). The amino acids sequences of these agents and their target proteins are first converted into signals (S) and processed using DFT using appropriate ASSs. The common point of interaction (CP) is first obtained. Based on this position, the level of interaction contributed by each agent at the common point of interaction (LCP) using all the ASSs are derived and compared. The average level of interaction contributed by Sifurvitude at the common point of interaction (LCPSif urvitude ) was found to be more than the Enfurvitude (LCPEnf urvitude ). This is to help assess the drugs that are to be made available in the management of the challenges posed by HIV. The technique is further engaged to develop devices that would help assess the resistance offered by these organisms and viruses called Computer-Aided Drug Resistance Calculator (CADRC). The target proteins for five anti-HIV/AIDS agents are first identified. Additionally, the amino acids sequences of these target proteins are obtained from database [79] and then translated into signals using one appropriate ASS. The signals are then processed using DFT. The consensus point of interaction (CP) is obtained. Using the CP, the level of bio-activity (BioA) between the protein residues of the susceptible (BioAS) and resistant (BioAR) strains are derived. The degree of resistance is then derived as: BioAS − BioAR
(1.1)
It is revealed in this study that resistances clinically obtained correlated with the resistances computationally derived using the sequence information of the susceptible and resistant strains. This therefore helped proposed a bioinformaticsbased biomedical tool called Computer-Aided Drug Resistance Calculator (CADRC). Finally, the technique is utilized to identify structural motifs called connecting peptide that separate the helices of two homologous HIV proteins namely, HIV gp41 core protein and 1DF5. Like in the previous experiments, the protein residues of the HIV gp41 core protein and 1DF5 are obtained from the database [79] and translated into signals using appropriate ASS. Rather than using DFT, Continuous Wavelet Transform (CWT) is applied and the positions of the Maximum Wavelet Coefficients (MaWC) and the Minimum Wavelet Co-
12
1. Introduction
efficients (MnWC) are identified. These regions are correlated with the actual positions of the connecting peptides. It is observed that there is a relation between the actual and predicted position. This therefore demonstrates that the regions that need be targeted for therapeutic interventions could be detected. The entire studies are intended to help battle the threats from HIV/AIDS disease. Eight experiments are conclusively carried out in this study. The amino acids sequences used are obtained from clinically studied experiments. They are not generated. In addition, the ASSs engaged have been clinically and computationally validated and deposited in the database [3]. The DSP techniques involved namely, RRM has been authenticated and employed in study of over 1000 proteins [35], while others have been utilized for several other proteins and peptides. As a result, the validation is achieved inherently.
1.2
The Main Contributions of the PhD Thesis:
In this research, bioinformatics techniques are employed to study bio-functionalities of proteins and peptides as well as protein-protein interactions. This is in order to help study disease processes, design and develop therapeutic interventions and bioinformatics tools. Three Digital Signal-processing technique are engaged. They are Resonant Recognition Model (RRM), Informational Spectrum Method (ISM) and Continuous Wavelet Transform (CWT). The findings made in this research have contributed in areas as discussed below: 1. Bioinformatics-based Approach to Examining the Mechanism of Disease Processes: HIV Progression to AIDS: This study helps determine the mechanism by which HIV transforms to AIDS using RRM. HIV-1 T-tropic viruses, which predominate the late stage (full blown AIDS phage) are found to possess the highest affinity for the CD4 unlike the HIV-1 M-tropic viruses that control the early sero-conversion and asymptomatic stage. This elevated affinity is found to bring about sustained attachment and attack on the CD4, immune breakdown that manifest as AIDS. Prior to this investigation, proposition for the transformation of HIV to
13
1. Introduction
AIDS has centered on immune decline, enhanced replication rate and Syncytium Induction, and ability of the viruses to infect tumor cell line. This study is a novel computational approach to examining HIV advancement to AIDS, which has not only added rationality to the assessment but demonstrated the mode of HIV progression to AIDS. Part of this study has been published by Current HIV Research Journal as: Nwankwo N, Seker H. 2012. ”HIV Progression to AIDS: A Bioinformatic Approach to Determining the Mechanism of Action”, (see appendix). 2. Bioinformatics Approaches to Characterizing and Identifying Tropic and Phenotropic Association: In this study, association of binding interaction between the HIV and Simian Immunodeficiency Virus (SIV) and the CD4 with their Tropic and Phenotropic groups is achieved. The T-tropic viruses are found to possess high binding interaction. They are also observed to possess the Syncytium Inducing (SI) capacity. This is unlike the M-tropic viruses, which have low affinity for the CD4 and also belong to the Non-Syncytium Inducing (NSI) family. Also, HIV-2 and the SIV demonstrated that they are nonprogressors in nature as they exhibited low affinity for the CD4. Prediction of high affinity for the CD4 by the HIV isolate called, MFA, which is known clinically as the most virulent isolate of the HIV is also achieved here. These findings help rationalize the categorization HIV and SIV into groups. Before now, HIV classification into the CCR5 tropic and the CXCR4 tropic has helped in the designing and management of anti-HIV/AIDS drugs like Maraviroc [39; 40]. This categorization will also assist in sorting out antiretroviral agents for administration. Part of these findings obtained in this research are prepared for submission as: Nwankwo N, Seker H. 2012. ”Bioinformatics-based Approach to Characterization and Identification of HIV Tropic and Phenotypic Association”, (see appendix). 3. Computer-Aided Drug Resistance Calculator (CADRC):
14
1. Introduction
Based on the information presented by the amino acids sequences and scales, the reduced level of interaction identified in this research by the mutated protein residues of the resistant strains of HIV has helped propose a Computer-Aided Drug Resistance Calculator (CADRC). Protein residues of the susceptible and resistant strains of HIV are analyzed by means of Informational Spectrum Method (ISM) and the resistant strains are found to demonstrate reduced interaction. Preliminarily, drug resistance assessment has also been through resourcewasting and time-consuming clinical experimentations. Evaluation of drug resistance by the bioinformatics approach provided in this study has shown that, a cheaper, faster, easier, yet reliable approach is computational. This proposal has been published: Nwankwo N, Seker H. 2010. ”A signal processing-based bioinformatics approach to assessing drug resistance: human immunodeficiency virus as a case study”. Conf. Proc. IEEE Eng Med Biol Soc. 2010:1836-1839 (see appendix).
15
1. Introduction
4. Categorization Species using Resonant Recognition Model (RRM): The method engaged in this study has helped classify the HIV and its host species based on the point of maximum interaction. Close association between human and Chimpanzee has already identified [41]. This study also recognized that both species share position of maximum interaction (maximum amplitude of the spectrum). Amongst the HIV, the study identified Zairian isolate, WMJ1 and the Cameroonian stock, 96CM-MP535 to share common biological functionality with an American isolate (SC). This study therefore has helped confirm that the computational approach engaged in this study is reliable for categorizing species. Isolates like MFA, BH10, HXB2 and HXB3 are found to share same position of maximum interaction (maximum amplitude of the spectrum) with the CD4 and are regarded as HIV/AIDS candidate interactors. Based on this feature, they are envisaged to offer better results in the study of HIV/AIDS. Before now, programs Phylogenetic tree is being used for the classification. A Fourier Transform-based program for the classification of species with regards to the position of maximum interaction is expected to emerge from this study. Part of these findings obtained in this research are prepared for submission: Nwankwo N, Seker H. 2012. ”Bioinformatics-based Approach to Characterization and Identification of HIV Tropism and Phenotypic association”, (see appendix). 5. Identification of Evolutionary Trends of Organisms by means of Resonant Recognition Model (RRM): It is revealed in this research also that isolates of the HIV from different continents share the same of position of maximum interaction (maximum amplitude in the spectrum). For example, HIV isolates obtained from Gabon called OYI as well as Z6 from Zaire are found to share same position of maximum interaction with an American isolate called CDC-451. This study tends to suggest trans-Atlantic transmission, pointing out that
16
1. Introduction
this computational approach can be successfully applied to establish evolutionary trends of organisms. Also, ape-to-human cross-species appears to be strengthened using these bioinformatics procedures. Earlier, techniques like the Phylogeny, Comparative ontogeny and Experimental biology have served as methods of analyzing evolutionary trends [42]. This RRM-based procedure, which uses point of maximum interaction to identify closely related species irrespective of locations has been shown to be reliable in charting out evolutionary road-maps for organisms. Part of these findings obtained in this research are to be submitted as: Nwankwo N, Seker H. 2012. ”Bioinformatics-based Approach to Characterization and Identification of HIV Tropic and Phenotypic Association”, (see appendix). 6. Determination of Potencies of Drugs and Starter Materials for Vaccines Prior to Development: Using amino acids sequences and scales information, the degrees of affinity that exist between two Plasmodial peptides that are being evaluated for use as starter materials in the production of Anti-Malarial vaccines (peptides P18 and P32) are assessed and delivered numerically. This information is necessary as it is responsible for specificity of antigen-antibody interaction that accounts for the potency of vaccines. In addition, ISM is engaged to determine the potency of the vaccine in terms of sensitivity. Additionally, the degree of interaction existing between two HIV Fusion Inhibitors called Enfuvirtude (T20), an FDA approved anti-HIV/AIDS drug, and Sifuvirtude which is still being studied for possible use in the management of HIV/AIDS and their target protein, Heptad Repeat (HR) are computed. Computational results obtained in this experiment are found to correlate with preliminary clinical findings. This study therefore demonstrate the computability of Bio-Functionalities. Drug and vaccine development always go through the process of stepwise clinical trials. They are no longer discovered by means of serendipity. There is need to make these tedious and expensive processes cost-effective. Po-
17
1. Introduction
tencies are also evaluated through clinical experimentations. Bioinformatics approaches, which determine the potencies of starter materials or finished products prior to development or production using computational approaches will help rationalize and optimize clinical approaches to drug discovery and development. The procedure engaged is one such approach. These findings have been published as: Nwankwo N, Seker H. 2011. ”Digital Signal Processing Techniques: Calculating the Biological Functionalities of Proteins”. J. Proteomics Bioinform 4: 260-268. doi:10.4172/jpb.1000199, (see appendix). 7. Prediction of Binding Interactions Between Plasmodial and Target Proteins Using Resonant Recognition Model (RRM): Using Plasmodial and the host proteins as a case study, the binding interactions that exist between them, which have clinically been established are computationally assessed to find out if binding residues could be identified using bioinformatics methods. RRM has been applied to analyze and predict viral proteins [35; 36], Cancer [43] - [44], Diabetes [45] and Anthrax [37]. This therefore suggests that binding interaction and also all other form of interactions existing between proteins of other species can be predicted and the method used to develop large-scale predictive algorithms. This approach remains more rational than the clinical methods especially in this post-genomic era. Another contribution offered by this research is that it has helped strengthen the understanding that both clinical interactions and computational assessments are sequence-content-dependent. Several approaches have been used to predict binding interactions in proteins. They include HotPatch [46], which is used to detect surface regions of the proteins responsible for biological activities. Others are ProteMot [47], which harmonizes surface structure with template domains obtained from known complexes. JET [48] engages both evolutionary tree analysis and physiochemical surface properties of the protein. RRM based procedure for prediction of binding interaction in all other species will contribute to the study of the biological structures and func-
18
1. Introduction
tionalities. This technique can also be developed to cater for large-scale predictions. This findings have been published as: Nwankwo N, Seker H. 2011. ”Preliminary Investigations into the Binding Interactions between Plasmodial and Host Proteins Using Computational Approaches”. J Proteomics Bioinform 4: 269-277. doi:10.4172/jpb.1000200, (see appendix). 8. Identification of the HIV gp120 Variable region 3 (V3) Motif-targeted Therapeutic Intervention for the HIV/AIDS Disease: Mutations in the V3 domain of the HIV are found to bring about the progression of HIV to AIDS disease. This is as a result of increased affinity for the CD4 by the HIV-1 T-tropic viruses, which take over the viral population during the CD4 depleted, full blown late stage. They are mutants of the HIV-1 M-tropic viruses that abound during the first stage of the HIV infection. Based on our finding, this research recommend the introduction of therapeutic interventions for the HIV/AIDS disease using devices that would target the mutations at the HIV V3 Motif that helps HIV-1 M-tropic transform into T-tropic viruses. Designing and developing of anti-retroviral agents that will target the V3 region of the HIV at the asymptomatic and sero-conversion stage have not been addressd. This may have been as a result of the fact that investigating the HIV progression to AIDS as consequence of mutations at the V3 motif has not been undertaken. With this finding, therefore it has become essential to develop anti-HIV/AIDS cure that would prevent mutations at the V3 region, which would lead to the transformation of HIV infection into AIDS. Part of this study has been published by Current HIV Research Journal as: Nwankwo N, Seker H. 2012. ”HIV Progression to AIDS: A Bioinformatic Approach to Determining the Mechanism of Action”, (see appendix).
19
1. Introduction
9. Continuous Wavelet Transform-Based Identification of the Connecting Peptides of HIV gp41 and 1DF5: This study helped identify the connecting peptides of the Connecting Peptides of HIV gp41 and 1DF5 using CWT approach. It also demonstrated a proportionality between the actual and predicted lengths of the connecting peptides. CWT methods of assessing biological functionalities and structures of protein have not been adequately utilized. The results obtained in this investigation are promising and this approach has great potential of applying in other such studies. This research work is to be submitted as Nwankwo N, Seker H. 2012 ”Continuous Wavelet Transform-Based Study of the Connecting Peptides: HIV gp41 and 1DF5 as a Case Study”, (see appendix).
1.3
PhD Thesis Structure
In this research, eight investigations are conclusively carried out and for clarity, this thesis is structured into nine chapters in such a manner that one investigation is conclusively and exhaustively reported in one chapter. However, there are three experiments, which are engaged in the demonstration of the methodologies. They are reported in Chapter Three. Chapter One introduces the research. It explains the research background and briefly mentions the rationale behind the research. It also discusses the experiments carried out, the results obtained, contributions from the study. Chapter Two details the preliminary studies. This is referred to as Literature Review. Previous related works are explored to better understand the aims and background of the research as well as the methodologies engaged. Chapter Three discusses and demonstrates the methodologies engaged using Plasmodial peptides P18, P32; Plasmodial host peptides including DARC and nDARC, and two HIV Fusion Inhibitors Enfuvirtide and Sifuvirtide. The methodologies used for the analysis of proteins and peptides are RRM, ISM and WT.
20
1. Introduction
Chapter Four reports findings made in the study of the transformation of HIV to AIDS using RRM. This is titled ”HIV Progression to AIDS: A Bioinformatic Approach to Determining The Mechanism of Action”. Chapter Five presents the outcome of another investigation titled ”Resonant Recognition-Based Characterization and Identification of Human Immunodeficiency Virus Tropic and Phenotropic Association”. Chapter Six gives an accounts of the determination of resistance arising from the exposure of five anti-retroviral agents to their HIV target proteins using ISM, sequence information and amino acid scales. Chapter Seven describes the outcome of a computational approach to predicting the the Biological Characteristics of Plasmodial and host proteins. Chapter Eight deals with the application of CWT Method in identifying the connecting peptides of the two helices belonging to the HIV gp41 core and its crystallographic product 1DF5. Chapter Nine reports the critical analysis made from the experiments and also provides the conclusions arrived at the research. These are followed by the appendices which consists of all the published works.
21
Chapter 2 Literature Review 2.1
Preamble
This chapter first considers the conventional clinical techniques to investigating Biological functionalities and the problems associated with these approaches. It further describes problems arising from the clinical approaches engaged in the study of proteins and peptides. The chapter also explains the reason why Bioinformatics approaches are engaged to evaluate the Bio-functionalities of proteins and peptides. This is a step toward understanding disease processes, designing and developing of drugs and vaccines. Preliminary investigations relating to the Bioinformatics approaches engaged in the study of the Biological functionalities of proteins and peptides using sequence information and amino acid scale (AAS) are first re-assessed and then used to establish the research background and intentions. The methodologies engaged in these initial studies are reviewed. This is to help justify the methodologies engaged. While relevant terminologies like Bioinformatics, Proteins and Peptides are explained, processes such as the Resonant Recognition Model (RRM), Informational Spectrum Method (ISM) and Wavelet Transform (WT) Method are briefly described. Discrete Fourier Transform (DFT), which forms the basis of RRM, is also explained.
22
2. Literature Review
2.1.1
Conventional Clinical Investigations
Prior to the computational approaches, assessment of biological functionalities of proteins and peptides constituents of cells, tissues, organs and organisms are known to involved laboratory-based experimentations. Biological functionalities have been assessed by means of conventional clinical methods such as Antimicrobial Susceptibility Test (AST) [13; 14; 17]. These techniques are recognized to involve manual-based procedures and have proven to be expensive, labourintensive, resources-consuming and therefore irrational [13; 14; 49; 50]. Unlike computational outcomes, the results of clinical investigations are recognized to be influenced by enviromental factors like acidity (PH), moisture, temperature, and the strains of the organisms [17]. To buttress the irrationality of the conventional clinical investigations of the biological functionalities, the methodologies of two clinical experiments are briefly described in this study. They are Analysis of Direct Action of Histamine on the Guinea Pig Ileum [18] and Antimicrobial Susceptibility Test using Disk Diffusion [13; 14; 17]. Analysis of Direct Action of Histamine on the Guinea Pig Ileum [18] is a typical conventional clinical approach to assessing a biological characteristics (Histamineinduced Contraction of the Guinea Pig Ileum). It involves cutting out 5cm of guinea-pig ileum, which is then submerged into an aerated 20ml bath containing Krebs solution at 37 degree Celsius where it is tested with various concentrations of Histamine prepared in 0.2ml to 0.5ml of saline solution. The contractions of the ileum are then recorded on a smoked drum with isotonic lever. This is in order to measure cholinergic properties of Histamine. Antimicrobial Susceptibility Test (AST) is another conventional clinical approach to assessing a biological functionality and in this case, the drug resistance or susceptibility. AST techniques include Disk Diffusion, Broth and Agar Dilution Methods [13; 14; 17]. Disk Diffusion Method, involves the diffusion of a given concentration of antimicrobial agent obtained from preparations like tablet, strips or disks which are placed on the solid culture medium such as agar plates that are impregnated with an isolated innoculum (bacterium) obtained from a pure stock. Results are obtained by measuring the zones of inhibition. The diameters of the zones of inhibition are known to be inversely proportional to the
23
2. Literature Review
Minimum Inhibitory Concentration (MIC) [13; 14; 51]. MIC is referred to as the least concentration of the anti-microbial agent that demonstrate no growth or multiplication on the organism [52]. As recognized in these two experiments though briefly described, clinical approaches are saddled with wastage of materials such as reagent, and test requirements including the laborious definitions and detailed documentations of specifications and interpretative criteria [13; 14]. Selection of appropriate materials such as antimicrobial agents from the readily available large stock as in the case of AST has been an arduous task [13; 14; 51]. Stringent and resource-consuming methodologies are also noted to be involved in the clinical assessments. They include Lyophilization or Cyrogenic preservations. Results obtained through laboratory experiments are also affected by several factors. In AST for example, outcomes are affected by the source of the innoculum; the carrier of the antimicrobial agent such as strips, disks; testing environment such as temperature, humidity, carbon dioxide and at times, human error [13; 14]. These explain the irrationalities associated with conventional clinical approach to assessing biological characteristics. Post-genomic era, which has witnessed generation of large volume of data, has left Bioinformaticans with the task of retrieving, stockpiling, categorizing, documenting, evaluating, or representing these huge biological data through exploration, creation or utilization of computational apparatus and procedures for the purposes of inventing new biological perceptions and creating comprehensive viewpoints [53]. For example, investigation into Human genome revealed that there are about 20,000-25,000 protein-coding genes. From this huge data, over 30,000 unique protein sequences are present in human proteome from which one percent has been successfully targeted with small drug molecules [54]. It has also been disclosed that one specie of Malaria-causing parasite, Plasmodium falciparum has about 4,611 peptides from which 728 physiologically active proteins are assembled [19]. More new proteins are created daily by all living organisms and viruses in an attempt to accommodate hazards from other organisms and unfriendly environment including exposure to drugs. These vast and increasing proteins and peptides data need be investigated continually in search of chemotherapeutic agents and chemo-preventive measures. This has become
24
2. Literature Review
increasingly challenging. Considering the fact that the proteomic and genomic data have risen due to mutations by living organism and viruses, it is becoming increasingly difficult for the clinical approaches to be effectively engaged. This therefore calls for Bioinformatics approaches.
2.1.2
Analysis of Protein Functionalities Using Sequence Information
Computational approaches to studying biological functionalities of proteins using amino acids sequence information have been recognized to involve three methods including Alignment, Genome and Expression techniques [55; 56]. Alignment techniques are known to depend on the degree of sequence homogeneity and employs programs including CLUSTRLW [29], Multiple Alignment using Fast Fourier Transform (MAFFT) [30], MUltiple Sequence Comparison by Log-Expectation (MUSCLE) [31], T-Coffee [32]. Unfortunately, non-homologous proteins have been identified to have demonstrated common biological functionaloities. The Genomic procedures on the other hand, engage the information contained in the full sequences of the gene to derive biological functionalities. Genomic procedure include the Pylogenetic tree analysis [55; 56]. The third approach is the Expression method, which is referred to as the measurement of gene expression using Microarray methodologies [57]. It has been employed in the Microarray Expression and Mutation analyses, and Comparative Genomic hybridization [55; 56]. Expression techniques have helped in the discovery of new gene, their functionalities and level of expression and as a result, assisted in the identification of emerging diseases [55; 56]. It has also helped correlate therapeutic responses and therefore has aided discovery of new drugs. It has also helped in the study of the effect of toxins in the body, which might originate from drugs in use or toxins introduced into the body [55; 56]. Apart from these procedures, application of computational methods such as Digital Signal Processing (DSP) has provided another dimension in the computational assessment of the biological functionalities of proteins as presented by
25
2. Literature Review
their amino acids sequence information [33]. In these procedures, the amino acids sequences are transformed into signals by converting their alphabetic codes into numbers using an AAS. Using DSP techniques, the protein functionalities are then presented as peaks or informational characteristics [58; 59].
2.2
The Basis of Bioinformatics Approaches in Investigating the Bio-functionalities
Clinical approaches to assessing biological functionalities are known to encounter laborious definitions, detailed documentations of specifications and interpretative criteria, selection of appropriate materials like the antimicrobial agents from a vast source. These have made the techniques cumbersome [49; 50]. On the other hand, the number of mutated peptides and protein residues which are continually re-appraised for their clinical significance is ever-increasing. It has become challenging to assess this vast data using time-wasting resources-consuming clinical approaches. In order to rationalize clinical procedures, computational approaches are therefore required. These approaches are the Bioinformatics techniques. To effectively replace clinical techniques with the computational procedures and further predict biological functionalities, there is need to correlate the outcomes of the experiments obtained using these techniques. In this research, these computational methods are engaged in the study of the biological functionalities of proteins and peptides.
2.2.1
Bioinformatics: Definition and the Rationale for the Approach
According to the NIH Working Definition of Bioinformatics and Computational Biology, Bioinformatics is defined as the investigation, improvement or employment of computational devices and techniques for intensifying the use of biological, medical, behavioural or health data which include obtaining, assembling, arranging, documenting, exploring, or representing such data [53]. It has been
26
2. Literature Review
described as the field of science in which biology, computer science, and information technology are merged together for the purposes of discovering new biological products [53]. Bioinformatics is intended to proffer understanding as well as generate a widely acceptable standpoint from which associated principles in biology can be recognized [60]. One of the Bioinformatics technique used in the analysis of proteins and peptides employed in the design of drugs and vaccines is the Digital Signal Processing (DSP) [33; 35]. In order to better understand the applications of DSP, a description of protein, peptide and their therapeutic application in the modern day treatment are first provided. Additionally, the concept of Amino Acid Scales (AAS) is described. Two scales with descriptor names, Electron-Ion Interaction Pseudo-Potential (EIIP) [35] and Hydrophobicity Index [10] are explained (Tab. 2.1). The DSP techniques is then explained.
2.2.2
Protein/Peptidic Therapeutics
Proteins are amino acids in linear sequences [34], and Peptides are protein fragments consisting of less than 50 amino acids residues [61]. Activities of proteins and peptides are recognized to go beyond nutritional relevance as they are noted to also exhibit physiological activities including anti-microbial and antihypertensive [62]. In nature, proteins exist as sequences. In essence, they occur in discrete units and may be regarded as digitalized or discretized entities. Some protein residues emerge in linear formation while others appear as branched. As a result of their discrete attribute, they benefit from DSP-based technique such as Discrete Fourier Transform (DFT) and Wavelet Transform (WT). In the DSP technique, protein residues which are represented with alphabetic codes are first converted to numerical sequences (signals) using one AAS that governs the functionality that is investigated. These signals are further processed using DSP-based methods. The place of proteins and peptides in therapeutic application has been recognized [63] - [67]. Several naturally occurring proteins and peptides which are known to elicit pharmacological actions have been identified. They include mammalian lactoferin, Bactenecins, Defensins, Indolicidins, Cathelicidins and Beta-
27
2. Literature Review
Table 2.1: Amino acids, three letter and one letter alphabetic code representations; and the amino acids scales, Electron-ion Interaction Pseudo-potential (EIIP) and Hydrophobicity (WOLR790101). Amino Acid Three Letter Code Letter Code EIIP WOLR790101 Alanine
Ala
A
0.0373
1.12
Arginine
Arg
R
0.0959
-2.55
Asparagine
Asn
N
0.1263
-0.83
Aspartic acid
Asp
D
0.0036
-0.83
Cysteine
Cys
C
0.0829
0.59
Glutamine
Gln
Q
0.0761
-0.78
Glutamic acid
Glu
E
0.0058
-0.92
Glycine
Gly
G
0.0050
1.20
Histidine
His
H
0.0242
-0.93
Isoleucine
Iso
I
0.0000
1.16
Leucine
Leu
L
0.0000
1.18
Lysine
Lys
K
0.0371
-0.80
Methionine
Met
M
0.0823
0.55
Phenylalanine
Phe
F
0.0946
0.67
Proline
Pro
P
0.0198
0.54
Serine
Ser
S
0.0829
-0.05
Threonine
Thr
T
0.0941
-0.02
Tryptophan
Trp
W
0.0548
-0.19
Tyrosine
Tyr
Y
0.0516
-0.23
Valine
Val
V
0.0057
1.13
Defensins which are recognized to exhibit antibacterial activity [65; 66]. Proteins and peptides produce by Bacteria, which have demonstrated ability to alter physiological changes in other living organisms and are being employed for self-protection include Microcin from Escherichia coli, Mersacidin produced by Lactococcus, Nisin and Epidermidin [64] - [67]. Other viral-based peptides and protein with some pharmacological activities include anti-HIV/AIDS agent
28
2. Literature Review
called T-20 [68]. Plants have peptides like Defensins [66] while amphibians secrete Bombinin, Magainins, Dermaseptins, Buforin II, Ranalexin and Brevinins [63; 67], and insects also produce peptides which include Cecropins and Melittin [63]. These peptides exhibit anti-bacterial and anti-fungal activities and are employed in self-preservation [63; 67].
2.2.3
Amino Acid Scale (AAS)
Amino Acid Scales (AASs) have been defined as any group of 20 numbers that demonstrate any of the experimentally derived or mathematically computed physiological, structural and biochemical properties of the amino acids [3]. Proteins consist of 20 essential amino acids or residues [61]. Amongst these 20 amino acids, nine cannot be synthesized in the body which are called essential. Also, six amino acids which can be synthesized from the body are referred to as non-essential while five which can only be synthesized under limited circumstances are known as semi-essential amino acids [69]. These 20 amino acids have demonstrated different degrees of involvement in various biological interactions which have been calculated as AASs [3]. There are over 565 AASs [70]. Two of these AASs are discussed below. AASs therefore are parameters, which determine the mechanisms of biomolecular interactions in the body including the processing of food, actions and resistances of drugs and every other biological activities of the living organisms and viruses. 2.2.3.1
Electron-Ion Interaction Pseudo-Potential (EIIP)
Electron-Ion Interaction Pseudo-Potential (EIIP), shown in Tab. 2.1 is one of the AASs. It is generally engaged by the RRM [35]. Proteins’ molecules are known to be sequences of amino acids with unrestricted electrons and charges [71]. These charges elicit short-lived polarization of the side-chain groups and result in electromagnetic oscillation between some parts of the protein molecule [72]. These oscillations in the protein molecules interfere with one another [73; 74]. During oscillation, molecules which share same biological characteristics are found to resonate at the same frequency leading to
29
2. Literature Review
amplified attraction or affinity [72] - [74]. The reverberation arising from electromagnetic oscillation between bio-molecules (electromagnetic resonance) is called the EIIP [35] - [37]. RRM which engages EIIP therefore determines biological characteristics of the protein residues in terms of bio-recognition, which signifies specificity in the binding interaction [35]. These are the earliest steps in every molecular interactions. Calculation of the EIIP has preliminarily been achieved [35; 73]. 2.2.3.2
Hydrophobicity Index
Free Energy has been linked to the removal from or introduction of side chains of amino acids to water [10]. This has also been found to relate to the spread of amino acids within the interior and exterior regions of the protein globules. Wolfenden investigated the relationship between tendency to introduce side chains of amino acids to water (Hydration Potential) and the distribution of amino acids (protein residues) both in the interior and exterior regions of the protein globules [10]. As a result of this investigation, a set of values are obtained, which are called Hydrophobicity Index. This Hydrophobicity Index (Tab. 2.1) demonstrates the degree of participation of each side chains of amino acids in the interaction with water.
2.2.4
Protein Sequencing
Protein Sequencing refers to the process of identifying the amino acids components (protein residues) as well as in the order of occurrence (arrangement) as was first achieved with two insulin chains [75]. The process has evolved from elementary techniques like Edman degradation, Mass Spectrometry [75] to the modern, faster and automated procedures like Shotgun Protein Sequencing (SPS) [76]. Techniques like Polymerase Chain Reaction has been introduced into the procedure to better the process [77]. Application of these modern, faster and automated sequencers has helped obtain amino acids components of protein. As the need to uncover protein structures and bio-functionalities arise amongst the fast mutating viruses and organisms, the
30
2. Literature Review
data that now exist for investigation, which will help develop therapeutic interventions, escalated. Bioinformatics approaches, ranging from storage, retrieval, and utilization of these proteomic and genomic information have been designed. Tools have also been developed for purposes of analyzing the sequence information in order to obtain therapy. Sequence information are stored in databases like GenBank [78], UNIPROT [79] and Protein Bank Data (PDP) [80]. Other specialized databases include Viralzone [81] and BIND which deals with protein-protein interaction [82]. Bioinformatics Centres also offer integrated databases where information are combined from different sources. This is obtained in PROSITE [83]. Bioinformatics analytical tools employed in the analysis of sequences include BLAST [84] and FASTA [85]. They are devices that are based on homology and similarity of sequences. They are helpful obtaining sequence data sets [86]. In analyzing sequences, tools such as ClustalW2 [87] are employed to multiple sequence alignments, and T-Coffee [32], which provides for combination of result from more than one algorithms. MUltiple Sequence Comparison by LoExpectation (MUSCLE) [31] and Multiple Alignment using Fast Fourier Transform (MAFFT) [30] are improved sequence alignment tools. There are several other tools, which are being engaged in the study of protein interactions and biofunctionalities. In this study however, DSP technique is utilized. This is because it considers protein as signals since they are amino acid constituents in discrete formation, which it decomposes to reveal the biological information contained in them.
2.3
Digital Signal Processing (DSP)
DSP is a technique which decomposes signals in an attempt to reveal information embedded in them [88]. It is a means of analyzing continuous and discrete signals. Proteins and peptides belong to the discrete signals because they occur in compartments (discrete). DSP has enhanced and simplified the process of analyzing the characteristics of protein residues as well as interactions that exist amongst them by presenting characteristics as spectra [33] - [36]. [89] - [90],
31
2. Literature Review
DSP has helped identify biological properties of proteins [35] such as the degree of binding interaction between proteins [35; 91], resistance offered by organisms that are exposed to various drugs [92]. This is in contrast to other procedures like Genotyping which do not predict but detect mutation [23]. In DSP-based approaches, proteins residues are first converted into numerical sequences (signals) using one of 565 available AASs that governs the biological functionality under investigation [70]. It is understood in this study that each AAS depicts a biological functionality. The choice of AAS depends on the biological characteristics being investigated [35]. These numerical sequences (signals) are then processed using DFT, and analyzed by means of Informational Spectrum Method (ISM) [36; 37; 93]. A DSP-based approach to designing drug is one of the In-silico based design techniques that evolved from earlier serendipitous and irrational drug discovery procedures. Unlike MSA, which dwells on homology [94], DSP techniques calculate biological functionalities, which they present as numerical values using one AAS such as EIIP. DSP techniques are adopted for this research because they present biological functionalities as spectral information, which are then analyzed to obtain their biological relevance. The DSP techniques which are engaged in this research are briefly described in this chapter. They are RRM and ISM. Another DSP technique which is also used in this study is Continuous Wavelet Transform (CWT) Method [38; 88].
2.3.1
Discrete Fourier Transform (DFT)
Fourier Transform (FT) is defined as the decomposition of a time-based signals into frequency-based sinusoids with the intention of providing clearer information embedded in the signal [38]. Mathematically, it is the multiplication of the sum over all time of the signal and a complex exponential which constitutes real and imaginary components. For a continuous signal, Fourier Transform is expressed as: Z
∞
f (t)e−jωt dt
F (ω) = −∞
32
(2.1)
2. Literature Review
The f (t) is the time signal, and e−jωt , the complex exponential, where e stands for exponential, j is an imaginary complex number expressed as | − 1|1/2 , ω symbolizes omega, and t signifies time. In the case of continuous signals such as sound, R integration or the use of integral functions ( ) is involved, whereas discrete sigP nals including protein residues require summation which entails summation ( ) function [38]. Discrete Fourier Transform (DFT) applies to digitalized signals [38; 88], such as protein residues, and it is expressed as:
X(n) =
N −1 X
x(k)e−j2Πkn/N
(2.2)
k=0
where n is the discrete time index runs from 0 to N-1 i.e. 0, 1, 2, . . . , N-1. k is the discrete frequency index and runs 0 to N/2 i.e. 0, 1, 2, . . . , N/2 as a mirror (symmetric) image is the output of the Discrete Fourier Transform (DFT) processing. x(k) represents m member of the numerical series where N is the length of the numerical series. X(n) stands for the DFT coefficient.
2.3.2
Resonant Recognition Method (RMM)
2.3.2.1
Introduction
RRM has been described as a DSP-based physio-mathematical technique which understands protein primary structure as sequence of numbers assigned from AAS EIIP [35; 36; 37; 45]. EIIP governs bio-recognition and binding interaction. Prior to the introduction of RRM, protein residues are presented as numerical spectra called Informational Spectrum Method (ISM) [89; 95]. However, when functionalities of protein, which are presented as numerical spectra were further found to demonstrate common peak as a consequence of common functionalities, RRM was introduced to identify their target proteins through a demonstration of one characteristic frequency [35; 95; 96]. RMM has benefited the study of several proteins from numerous functional groups. They include proteins like Anthrax Protective Antigen (PA), Anthrax Toxin Receptor/Tumour Endothelial Marker 8 (ATR/TEM8) and Capillary Morphogenesis Protein 2 (CMG2) [37]; the cancer suppressor proteins p53 and pR13
33
2. Literature Review
that resulted in Oncophage vaccine developments [43]; Heat shock Proteins (HSP) [97]; tumour-activating DNA virus, Simian vacuolating virus 40 or Simian virus 40 (SV40) and its enhancer [37]; hemagglutinin (HA) membrane glycoprotein of the influenza virus [43; 97]; and hormone prolactin [45]. In addition, it has helped in the discovery and development of therapies for diseases linked to these proteins. Such diseases include fearsome Swine Flu [93], Cancer [43; 44; 97], Diabetes [45], Anthrax [37] and HIV/AIDS [35; 98]. RRM procedure, like other FT-based DSP technique, commences with the conversion of the of the alphabetic code of amino acids sequences into numerical values using the Electron-Ion Interaction Pseudo-Potential (EIIP) AAS [35]. This is followed by the up-sampling or zero-padding of these numerical sequences using zero values so as to bring them to same window length. These two procedures transform the amino acids sequences into numerical signals of equal length. Spectral Characteristics (SC) of the signals, namely the proteins, are then obtained by using the DFT. Because DFT calculation is time-consuming, a faster algorithm which does not alter the outcome of the DFT processing called Fast Fourier is used. Spectral Characteristics (SC) of the signals, which are the plot of the amplitudes and the frequency of the absolute values of the imaginary and real DFT outputs disclose the information embedded in the protein residues. The x-axis (Frequency) determines the position of the bio-recognition and binding interaction. The y-axis (Amplitude) signifies the contributions of the sequence in terms of the degree of bio-recognition and binding interaction which is a function of affinity. This is because AAS engaged in the RRM technique, EIIP is governed by electromagnetic resonance arising from oscillating, spinning amongst interacting protein molecules that cause them to bio-recognize and pulling together (affine) [35]. Cross-spectral (CS) analysis which refers to the point-wise multiplication of the DFT-processed signals [99] is performed. It helps identify the common information contained in these protein residues [35]. This common biological activity is symbolized by Consensus Frequency (CF) which is produced by the region of the sequence that contributes to the functionality.
34
2. Literature Review
2.3.3
Informational Spectrum Method (ISM)
ISM refers to the DSP-based procedure that engages any Amino Acid Scales (AAS). Like the DSP technique, ISM starts with the conversion of the alphabetic code of amino acids sequences into numerical values using appropriate AAS that relates to the interaction under investigation. This includes the Helix conformation-oriented inhibition found the HIV Fusion protein by Fusion Inhibitors like Enfuvirtide as employed in the calculating drug resistance [92]. These numerical sequences (signals) are further processed using DFT in order to represent the biological characteristics of the protein as spectral features. A plot of the absolute values of the complex DFT is called, Informational Spectrum (IS). The y-axis (Amplitude) signifies the biological functionality such as drug resistance arising from each protein residue while the x-axis (Frequency) determines the position of interaction examined [35]. The next step in the ISM procedure is to determine the Common Informational Spectrum (CIS). When comparing proteins with common biological functions like the anti-HIV/AIDS inhibitory activities as examined, pointwise multiplication of the DFT-processed signal from all the sequences studied provides common information about them. This is called Common Informational Spectrum (CIS) [36]. The CIS helps disclose the common biological activity signified by CF. This depicts the region of the sequence that contributes to the functionality examined.
2.3.4
Wavelet Transform (WT) Method
WT is a signal processing-based technique that has helped project data from one domain to another so as to disclose the hidden information [100]. In WT method, the protein residues are first converted into signals. They are further transformed into several groups of coefficients in order to reveal the characteristics of the data using various scales. These coefficients, which are the measure of the degrees of similarity between the encoded signals and the chosen waveform are presented as scalogram [38; 88]. WT engages two experimental procedures namely, Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT). These procedures are de-
35
2. Literature Review
tailed Chapter 3. Techniques such as Inverse Fourier Transform (IFT) [44] and Short Time Fourier Transform (STFT) [101] have been acknowledged to have helped identify protein residues that contribute to biological functionalities. However, WT remains a better tool and has also been gainfully employed in the identification of secondary structures of proteins [7; 9; 102]. This is because it detects the protein active sites as domains or regions [44]. Unlike the STFT, WT analyzes the entire protein residues from which, it highlights the regions of high or low energy in a particular frequency as the protein residues that contribute to a particular biological characteristics [88]. It also allows the investigation of the entire frequency allowing the identification of the regions of high energy in a particular frequency. This is because STFT uses single analysis window while the WT employs short windows at high frequency to obtain sharper information and long windows at low frequency to study the entire length of the proteins but with low resolutions [103]. WT has played major roles in the prediction of active sites of proteins including proteins like Oncogene [43], Prolactin [45], Epidermal Growth Factor (EGF) of Mouse, Human Beta Haemoglobin, Prolactin and Tuna Heart Cytochrome C [35], Human alpha Haemoglobin HIV gp160, HIV gp120, HIV gp41 obtained from HXB2 isolate [35; 104] and also from the HIV LAV isolate [98] and the non-coding domains of the DNA [105]. Apart from the retrieval of the constituent protein residues and positions of the helices in protein, the short peptides connecting the helices and the beta strands [9], WT has helped the study of the Hydrophobic protein cores [106]. Determination of the structural families in which protein residues belong has been achieved by means of WT [107]. Additionally, the locations and topology of the helices in Transmembrane proteins have been obtained using WT [108]. Other helical regions have been identified and the periodicity of 3.6 residues noted to have high tendency for the transformation of the protein residues into alpha helix using WT [109]. This is to say that one complete alpha helix turns (cycle) consists of 3.6 amino acids [102; 109]. WT has been employed in the identification of the characteristics features of the nucleotides, study of the Chinese hamster cells [110] as well as the coding
36
2. Literature Review
and non-coding sequences [111]. WT has been described as a suitable apparatus for short-lived events and localized sequences [112]. By means of WT, formation of the primary and secondary protein structures have been well identified [102; 109]. Hydrophobicity and Hydropathy scales have been recognized to have played important roles in the formation of alpha helices, beta strands and connecting peptides [102; 113]. Fig. 2.1 is an example of a scalogram of the CWT analysis of the Hepatitis C Virus Core Protein. Small size scales (low scales) capture local details as it is brighter (high resolution) while big sized scales picks up large-scale attributes but with low resolution [100]. WT is therefore a multi-scale analytic process [114].
Figure 2.1: Scalogram of J6 HCV core protein showing the maximum wavelet coefficient (red) representing the position of the protein hydrophobic regions like helices, and the minimum wavelet coefficient (blue) signifying the hydrophilic domains like the connecting peptides. Hydrophobic Amino Acids Scale is engaged in the analysis. Preliminary studies identified that WT has several advantages over FFT [43; 115]. They include the fact that Wavelet analysis is capable of revealing aspects of data that other signal analysis techniques miss, including trends, breakdown points, discontinuities, and self- similarity [38]. Secondly, WT is also often used to
37
2. Literature Review
compress or denoise a signal without any appreciable degradation and are found to be more efficient and faster than Fourier methods in capturing the essence of data [43; 115]. WT of a signal are recognized to perform better than FT when the signal under consideration contains discontinuities and sharp spikes [43; 115]. IFT and STFT have been acknowledged to help identify only the number of protein residues that contribute to a particular biological functionality [44]. However, WT is known to spot out the protein active sites which consist of domains or regions within the protein molecule [44]. Several analyzing wavelets, which are also called the Mother wavelets have been identified including Haar, Daubechies, Coiflet, Symlet, Meyer, Morlet and Mexican Hat [116]. However, a particular wavelet called Morlet wavelet has been found to be most appropriate for the identification of active sites [104; 112], and as a result, it is engaged in this study. This chapter describes conventional clinical assessment techniques, problems associated with clinical evaluations in order to help rationalize the employment of computational approaches to studying biological functionalities. In brief, it also defines and explains relevant terminologies, describes the methodologies and validates methods chosen for specific experiments. Subsequent chapter discusses the methodologies in detail. ————————————————————————
38
Chapter 3 Methodology 3.1
Introduction
In this chapter, Resonant Recognition Model (RRM), Informational Spectrum Method (ISM) and Wavelet Transform (WT) Method, which are engaged in order to show how they are used to assess biological functionalities and interactions amongst proteins and peptides are described and demonstrated. RRM and ISM is demonstrated using two peptides designated Pep1 and Pep2 (Tab. 3.1), which are known to bind to a protein called HLA-Cw*0102 [117]. In addition, CWT is explained using a synthetic peptide called 1AL1. Part of these investigations are published as 1 Four investigations are then used to demonstrate how biological functions are assessed. First, Plasmodial Peptide P18 and P32, which are currently being studied for use in the development of ant-Malaria vaccines are engaged in this study. This is to primarily determine the Specificity of the vaccine. Specificity of the vaccine is characterized by the level of Bio-affinity between the antigen (foreign body) and antibody produced by the host, a characteristics that can be assessed using RRM. It assesses inter-molecular interactions between the Plasmodial and host proteins involved. Thereafter, ISM is used to assess the Sensitivity of the vaccine obtain from the two any amino acids scales found to be involved in the interaction. In this 1
Nwankwo N, Seker H. 2011. Digital Signal Processing Techniques: Calculating the Biological Functionalities of Proteins J. Proteomics Bioinform 4: 260-268. doi:10.4172/jpb.1000199.
39
3. Methodology
case, intra-molecular interactions, preceding binding and mixture of the molecular contents of the Plasmodial and host proteins involved Proteins are evaluated. The intra-molecular interactions may involve more than one physiochemical or structural property, and as a result, more than one AAS. Sensitivity of the vaccine is governed by interactions inside the two organisms and in this case, the transfer charge-based interactions. Plasmodial peptides, P18 and P32 are engaged. Another investigation involves the use of both RRM and ISM to compare the potency of two anti-HIV/AIDS, Enfurvitude and Sifurvitude. In the third investigation, sequence alteration (mutation) is used to demonstrate the biological effect of one point mutation in the Duffy Antigen Receptor Chemokines (DARC) peptide and its mutant nDARC (Y41F) using appropriate amino acid scales (AASs). This is necessary because the method is used to assess drug resistance in the five anti-HIV/AIDS drugs. DARC is a Plasmodial host protein found in human and other species. Digital Signal Processing (DSP) techniques are analytic procedures which decompose and process signals in order to reveal information embedded in them [88]. The signals may be continuous or discrete such as the protein residues. DSP techniques have helped present protein interactions and bio-functionalities as spectral features [35]. This approach is different from the Multiple Structural Alignment (MSA), which dwells on homology [94]. By means of these approaches, protein residues are first converted into numerical sequences (signals) using one of 565 available Amino Acid Scales (AASs) [70] that is responsible for various biological functionality. AASs describe the physiochemical characteristics such as Hydropathy [8]; Hydrophobicity [4; 103]; Hydrophilicity [118; 119]; and Structural features including Alpha-Helix [120; 121], and Beta conformations [121]. These numerical sequences (signals) are then processed by means of Discrete Fourier Transform (DFT) in order to present the biological characteristics of the proteins in the form of spectrum. This procedure is called Informational Spectrum Method (ISM) [33; 36; 93]. A version of the ISM which engages a specific amino acid scale ”ElectronIon Interaction Pseudo-Potential (EIIP)” is called Resonant Recognition Model (RRM). In this procedure, biological functionalities are presented as Spectral
40
3. Methodology
Characteristics (SC) [35]. Where there exist a common biological characteristic, a common point of interaction called Consensus Frequency is demonstrated [35]. This physico-mathematical process is based on the fact that bio-molecules with same biological characteristics recognize and bind to themselves when their valence electrons oscillate and the reverberate in an electromagnetic field [35].
3.2
The Methodologies
3.2.1
Resonant Recognition Model (RRM)
3.2.1.1
Explanation
RRM is a DSP-based technique which recognizes protein primary structures or physiological functionalities and interactions as protein residues represented by numbers which are assigned from the Electron-ion Interaction Pseudo-potential (EIIP) scale [35]. EIIP is one of the amino acid scales. It is engaged by the RRM in order to determine the biological characteristics of the protein residues in terms of biorecognition (specificity) and binding interaction (affinity) [35]. It is the first step in every molecular interactions. The values of EIIP are given in Tab. 2.1, which is explained in Chapter 2. In this Chapter, detailed explanation of RRM is provided and demonstrated by using two peptides, which consist of 9 amino acids length and designated Pep1 and Pep2 (Tab. 3.1). Pep1 is obtained from the HIV Gag-pol while the Pep2 is an extract of the HIV gp160 are known to share a common biological characteristic. They are recognized to bind to a protein, HLA-Cw*0102 [117]. The entire procedure is illustrated in Fig. 3.1
41
3. Methodology
Table 3.1: Peptides Pep1 and Pep2, extracts from the HIV Gag-pol and gp160, which are known to share a common biological characteristic of binding to a protein, HLA-Cw*0102 are used to demonstrate the RRM procedure. Peptide Identity Peptide Pep1
VIPMFSALS
Pep2
CAPAGFAIL
Figure 3.1: Illustration of the Resonant Recognition Model (RRM), Informational Spectrum Method (ISM) and Wavelet Transform (WT) procedures.
42
3. Methodology
The steps that involve the RRM analysis are as follows: • Step 1: Conversion of the Protein Residues into Numerical Values Electron-ion Interaction Pseudo-potential (EIIP) Scale First is the conversion of the amino acids sequences into signals using EIIP. There are 20 essential amino acids constituents (protein residues) [122]. As explained in Chapter 2, biological interactions have been studied in relation to the behaviour of these 20 amino acids constituent of proteins. As a result, the level of participation by these protein residues in over 565 protein-protein interactions that characterize biological functions have been derived as AASs and deposited in databases such as [3] and literatures [123]. EIIP calculation in organic molecules has been demonstrated [36] as: W = 0.25Z ∗ sin(1.04πZ ∗ )/2π
(3.1)
in which case Z ∗ represents the average quasivalence number (AQVN). Z ∗ is assessed as: Z∗ =
X
xyi Zi /A
(3.2)
where Zi stands for valence number for the i th component of the atom, and yi , the number of atoms of the i th atomic component, x is the number of atomic components possessed by the molecule, and A, the sum of the atoms involved. The values of the EIIP is expressed in Rydbergs (Ry) [93]. • Step 2: Zero-padding/up-sampling In some cases, proteins that are to be analyzed by means of RRM have unequal residues. Signal Processing techniques require that the window length of all proteins be the same [88]. In the case of the peptides Pep1 and Pep2, which are of equal length (9) as shown in Tab. 3.1, zero-padding or up-sampling is not required. However, in analysis where the protein residues are unequal in length as in the case of Plasmodial peptides P18 and P32 (Tab. 3.2), 14 zeros are added to the peptide P18 so as to bring
43
3. Methodology
the to to same sequence (window) length with the longest sequence (N), which is P32 prior to the decomposition of the numerical signals by means Discrete Fourier Transform (DFT). Zero-padding operation is usually used to improve the visual clarity of the spectrum. It does not change the quality of the spectral results [124]. Table 3.2: Peptides (P18 and P32) used in comparing the potencies of starter materials for Malaris vaccine design. Peptide Identity Peptide Sequence length P18
EWSPCSVTCGNGIQVRIK
18
P32
IEQYLKKIKNSISTEWSPCSVTCGNGIQVRIK 32
• Step 3: Processing of the Numerical Sequences (Signals) using Discrete Fourier Transform (DFT) Transform refers to the conversion of one signal into one or more simpler signals with the purpose of studying the content of the signal, without altering the information embedded in them [88]. It is the use of an algorithm or mathematical equation to change series of information from one form to another in order to understand the information contained in them. Fourier Transform is defined as the decomposition of a time-based signals into frequency-based sinusoids with the intention of providing clearer information embedded in the signal [38]. Mathematically, it is the multiplication of the sum over time of the signal f (t) and a complex exponential (e−jωt ), which constitutes real and imaginary components. Given these two components, Fourier Transform is expressed as: Z
∞
F (ω) =
f (t)e−jωt dt
(3.3)
−∞
where e stands for exponential, ω signifies omega, j is called an imaginary √ complex number, which is the (−1)1/2 or ( −1) and t symbolizes time. Discrete Fourier Transform applies to digitalized and repetitive (periodic) signals [38; 88], including protein residues. When the signals are non-
44
3. Methodology
repetitive (aperiodic), the technique is called Discrete Time Fourier Transform. Discrete Fourier Transform applies to only infinite signals. However, in order to process finite signals of protein residues, they are considered as infinite repetition of finite signals and as a result, finite signals can be analyzed using Discrete Fourier Transform [88]. The 9 amino acid long Plasmodial peptides Pep1 and Pep2 (Tab. 3.1), when translated into signals with a sample number N, will range from 0-8 (0 to N-1) points can be processed with DFT. As stated earlier, in order to process finite signals of protein residues as the Pep1 and Pep2, they are considered as infinite repetition of finite signals with a complete circle 2πf of N length with frequency, f. Therefore, f=k/N, and a complete circle with the discrete frequency index (k) will be 2πk/N . DFT is expressed as:
X(n) =
N −1 X
x(k)e−j2Πkn/N
(3.4)
k=0
n is the discrete time index runs from 0 to N-1 i.e. 0, 1, 2, . . . , N-1. k is the discrete frequency index and runs 0 to N/2 i.e. 0, 1, 2, . . . , N/2 as a mirror (symmetric) image is the output of the Discrete Fourier Transform (DFT) processing. x(k) represents m member of the numerical series where N is the length of the numerical series. X(n) stands for the coefficient of the DFT. For peptides Pep1 and Pep2 as shown in Tab. 3.1, the DFT is represented as:
X(n) =
8 X
x(k)e−j2Πkn/N
(3.5)
k=0
DFT is the sum of the sine and cosine waves, which are also considered as Real (R) and Imaginary (I) values of a complex number [38; 88]. The cosine waves represent the real values while the sine waves are the imaginary
45
3. Methodology
values. These sine and cosine waves engaged in the DFT are regarded as Basis Functions [88]. The result of the DFT consist of a complex components expressed as:
X(n) = [R(n) + I(N )j]
(3.6)
where n=1,2,...,N/2, R and I represent the real and imaginary parts respectively. Using Pep1 as an example and n=0-8, the EIIP values are as 0.0057, 0.0, 0.0198, 0.0082, 0.0946, 0.0829, 0.0373 and 0.0 0.0829, which represents a signal of Pep1 regarding it ability to bio-recognize and bind to other proteins. When this signal is processed using DFT, it yields a set of real and imaginary numbers, which are 0.3314 , -0.1169 + 0.0550j, 0.1147 + 0.0572j, -0.0889 + 0.0788j, -0.0490 + 0.0778j, -0.0490 - 0.0778j, -0.0889 - 0.0788j, 0.1147 0.0572j and -0.1169 - 0.0550j. In order to determine common biological functionalities amongst proteins, common characteristics frequency is needed. Absolute values of the signal as well as the real and imaginary parts have been engaged in the study of proteins [125]. Absolute values are obtained as: abs(a) = |(b2 + c2 |1/2 where b is the coefficient while c the conjugate. Note that if x = 3 + 4j, the absolute value would be: |(3 + 4j)(3 − 4j)|1/2 = |(32 + 12j − 12j + 42 (j 2 )|1/2 , and j = | − 1|1/2 This is then = |32 + 42 |1/2 = 5. Therefore, for n=1, the DFT values for n=1, -0.1169 + 0.0550j as an example, the absolute value is obtained as: | − 0.11692 + 0.05502 |1/2 = |0.01669061|1/2 = 0.12919. Therefore, the Absolute values for the Pep1 are: 0.3314 , 0.1292 , 0.1282 , 0.1188, 0.0919 , 0.0919, 0.1188, 0.1282 0.1292
46
3. Methodology
Table 3.3: The results of the RRM analysis of Pep1 and Pep2 showing the biological information contained in the protein as Spectral Characteristics (SC) and Cross-Spectral (CS) features. n Pep1 EIIP SC Pep2 EIIP SC CS 0 V 0.0057 DC C 0.0829 DC 1 I 0.0000 0.1292 A 0.0373 0.0205 0.0026 2 P 0.0198 0.1282 P 0.0198 0.1492 0.0191 3 M 0.0082 0.1188 A 0.0373 0.1008 0.0120 4 F 0.0946 0.0919 G 0.0050 0.0982 0.0090 5 S 0.0829 0.0919 F 0.0946 0.0982 0.0090 6 A 0.0373 0.1188 A 0.0373 0.1008 0.0120 7 L 0.0000 0.1282 I 0.0000 0.1492 0.0191 8 S 0.0829 0.1292 L 0.0000 0.0205 0.0026 The entire results are as shown in Tab. 3.3. As shown by the results in the table, the outcomes of all the DFT processing of the Pep1 and Pep2 yields set of values which are symmetric [126]. Fig. 3.2 is the symmetric image of the Spectral characteristic of Pep2. The first value, which is called the average value or DC component is discarded. It is the product of the zero frequency with high Real values (Cosine coefficients) and zero Imaginary values (Sine coefficients) [38].
Figure 3.2: Symmetric (mirror) image of the Spectral characteristic of Pep2 as an illustration of a DFT property. Half of the image is therefore engaged in the studies.
47
3. Methodology
• Step 5: Analysis of the results The Absolute Spectrum, which is a power spectrum of 2 and represent the magnitude of the spectrum as amplitude [35] that describes the degree of interaction and in this case, bio-recognition and bio-attachment can be found using: Sa (n) = X(n)X ∗ (n) = X(n)2
(3.7)
where Sa stands for Absolute spectrum of a given protein, while X(n) represents the DFT coefficient of the signal, X*(n) is the conjugate, and n=1N/2. Though DFT is known to be the most straight-forward mathematical procedure [88], it is found to be inefficient. As a result, a more efficient, rational and faster algorithm which executes DFT faster is used. It only quickens the DFT processing and does not alter the results [127]. It is called Fast Fourier Transform (FFT). The plots of the Absolute values represent the biological characteristics of the two peptides (Pep1 and Pep2) in terms of bio-recognition and bioattachment. These characteristics are presented as spectrum. This is because the Amino Acid Scale used is EIIP, which considers interactions as a result of affinity arising from electromagnetic attraction as the valence electrons oscillate [35]. The vertical (y-axis), which represents the Amplitude symbolizes the binding interaction (affinity) of each protein residue while the x-axis (Frequency) determines the position of the interaction. The vertical axis can be normalized to a maximum value of 1 or 100%. Tab. 3.4 is the results of the actual and normalized values Spectral Characteristics and Cross-Spectral features of peptides (A) Pep1 and (B) Pep2, while Fig. 3.3 is plot of their Spectral Characteristics. On the Horizontal Axis of the Frequency Domain, maximal frequency (F) is calculated as :
F = 1/2d
48
(3.8)
3. Methodology
Table 3.4: The normalized results of the RRM analysis of Pep1 and Pep2. The highest value was brought to 100% in order to ease interpretation of the results. n Pep1 SC SC normalized Pep2 SC SC normalized CS CS normalized 0 V DC C DC 1 I 0.1292 100.00% A 0.0205 13.73% 0.0026 13.84% 2 P 0.1282 99.21% P 0.1492 100.00% 0.0191 100.00% 3 M 0.1188 91.94% A 0.1008 67.56% 0.0120 62.60% 4 S 0.0919 71.15% L 0.0205 65.80% 0.0090 47.18%
Figure 3.3: Spectral Characteristics of peptides (A) Pep1 and (B) Pep2 showing y-axis as the normalized amplitude values (biological information) and x-axis as the frequency or positions of the of biological activities. In the numerical signal of protein residues, the distance (d) between amino acids are assumed to be equal, which indicates equal unit sampling rate and as a result, d=1 and as such, F = 0.5. The horizontal axis can be represented in four different ways. In one of the representations, the horizontal axis presents the amino acids sequences of the proteins as the sampling numbers. In another case, the horizontal axis is taken as a fraction of the protein residues ranging from 0 to 0.5.
49
3. Methodology
This fraction lies between the DC and one-half of the length of the protein residues. The horizontal axis can also be expressed in terms of 0 − π, and also as the natural frequency of a sample [88]. • Step 4: Cross Spectral (CS) Analysis In order to identify the common biological relationship amongst proteins, the SC of the protein residues are then, point-wise multiplied [99]. This process is called Cross Spectral (CS) analysis [35]. Proteins with common biological functionality are known to share one significant peak called the Consensus Frequency (CF) [35]. This can also help identify the motifs responsible for the biological functionality [128]. CS or Point-wise multiplication of the Absolute values of the Characteristics Spectrum is represented as:
Ca =
Y
S(a) (m)
(3.9)
Ca represents the absolute of the CS Analysis; m=1, 2, . . . M; and M is the number of protein/peptide sequences engaged. In the case of Pep1 and Pep2, there are only two set of peptide sequences.
50
3. Methodology
Figure 3.4: Cross-Spectral features (normalized) of peptides Pep1 and Pep2, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. • Step 5: Analysis of the Results Using Consensus Frequency (CF) and Position of Maximum Amplitude Two position can be used to analyzed the results. They are the Consensus Frequency (CF) and the position of maximum amplitude. Each position signifies a point of common biological interaction. According to the RRM procedure [35], the relationship between the Consensus Frequency (CF), the longest sequence (N) and the Peak Position (PP) can be expressed as: PP (3.10) N where N represents the length of the longest protein in the dataset. CF =
51
3. Methodology
The amplitudes attained by protein residues at the CF demonstrated are useful in determining relationships that exist between the proteins as well as the organisms that harbour them [35; 36; 129]. Results can also be expressed in terms of percentages [35]. Secondly, common maximum amplitude appears to suggest consensus maximum interaction with another protein. This characteristic seem to be peculiar to group of proteins and as a result, consensus maximum interaction suggests a demonstration of common biological functionality. Based on the position of maximum amplitude, organisms are categorized into groups.
3.2.2
Informational Spectrum Method (ISM)
Having bond (achieved inter-molecular interaction), the amino acid contents of the peptides are required to interact with the host target proteins (intra-molecular interaction) in order to initiate the production of the immunizing components. Binding is the first step in every interaction. This is called inter-molecular interaction as it occurs between molecules (protein). After the molecules have adhered to one another, there is a mixture of the contents, which allows the contents to interact. This is called intra-molecular interaction. This depends on the relationship that exist between the molecules. It could be based on structure conformation such as helix, alpha and beta or physiochemical properties like Hydrophilicity, Electrostatic interactions. The degree of intra-molecular interaction determine bio-activity of the drugs and vaccines. While RRM is concerned with the molecular interaction as it uses the Amino Acid Scale (AAS) called EIIP, which is governed by bio-recognition and bio-attachment, ISM makes use of all the AASs. ISM is therefore referred to as a DSP-based technique that considers protein primary structures or physiological functionalities as numbers using any amino acid scale [35; 93; 130]. ISM procedure has been used to investigate principal arrangement in Calcium binding protein [130], Influenza viruses [93]. ISM uses the same procedure as the RRM. It involves three main steps and uses any amino acid scale. Like the RRM, the steps include the conversion of the alphabetic code of amino acids sequences into numerical values using amino
52
3. Methodology
acid scale that relates to the interaction under investigation. This is followed by the processing of the numerical sequences (signals) using Discrete Fourier transform (DFT). Absolute values of the complex DFT represented as a plot called Informational Spectrum (IS). It discloses the information embedded in the protein residues. The y-axis (Amplitude) signifies the biological characteristics such as drug susceptibility or resistance. The x-axis (Frequency) determines the position of biological interaction. The third step entails obtaining Common Informational Spectrum (CIS). CIS compares the activities of proteins which have common biological functions.
3.2.3
Continuous Wavelet Transform (CWT) Method
This is the third method employed in this research. It is also described in details and illustrated using a synthetic but amphiphilic peptide called 1ALI. Amphiphilic peptides are peptides with both hydrophilic and hydrophobic constituents [131]. Continuous Wavelet Transform (CWT) Method is used in this study to identify the connecting peptides for the two helices of the HIV Transmembrane protein, gp41 and it crystallographic product, 1DF5. As a result, CWT-based approach is demonstrated in this study using amphiphilic peptide called 1ALI. WT is a signal processing-based approach which converts signals from one domain to another in order to reveal hidden information contained in them [100]. WT translates signals such as numerical representations of the protein residues into several groups of coefficients so as to disclose the characteristics of the signals at various scales. At high resolutions, WT employs short windows and high frequency to produce sharper information (brighter local details) while largesized scales are used to spot wide-ranging but unclear features at low resolution [100; 103]. This is unlike the STFT which uses single analysis window and as a result, WT is regarded as a multi-scale analytic process [114]. WT and its application in identifying protein secondary structures are detailed in Chapter 1. Continuous Wavelet-based approach is utilized in this study in an attempt to identify of the connecting peptides of the 1DF5 and HIV gp41 Core protein. The procedure is described and demonstrated as shown below.
53
3. Methodology
Steps involved in the CWT has preliminarily been described [38; 43; 98; 104]. The first step is to translate the alphabetic code of the protein residues into signals using appropriate AAS. The signal is then decomposed by means of CWT. CWT of protein residues is defined as the sum over all the signals s(t) of the protein residues, multiplied by scale and the shifted version (ω) of the wavelet as shown in Eq. 3.11. This yields wavelet coefficients that are functions of scale and amino acids position [38]. For the scale (a) and protein residue (b), the coefficient of the signal s(t) has been expressed [102] as: t−b 1 s(t) √ ω ∗ ( )dt a a R
Z Ca, b =
(3.11)
The superscript asterisk represents conjugate component, indicating that the output of the CWT analysis consists of real, imaginary and absolute values. ω( t−b ) a is the wavelet shift function by a protein residue over a scale, s(t) is the signal (numerical sequence), a is the scale and b, the protein residues. The result of the CWT-based signal processing of signals is a Wavelet Coefficient, which is the measure of the similarity between the wavelet of the numerical signal and analyzing or mother wavelet [116]. Wavelet Coefficient is therefore defined as a measure of the degree of similarity between the wavelet of the numerical signal and chosen wavelet [38]. Mother wavelets, which have been engaged in the analysis of signals include Haar, Daubechies, Coiflet, Symlet, Meyer, Morlet and Mexican Hat [116] A portion of the signal obtained from the protein residues is processed using a mother wavelet and the process then shifted to the right until all the protein residues are covered. This procedure can be continually repeated using scaled or stretched wavelet in order to obtain more information. The CWT Coefficients obtained are then represented as plots referred to as scalogram (Fig. 3.7). This is to help analyze the information embedded in the proteins. CWT is called continuous because in CWT analysis, the wavelet is shifted to the right continually until the entire protein residues are analyzed [38]. Identification of the location alpha helix structure in a peptide called 1ALI is used to demonstrate CWT analysis. 1ALI as shown in Fig. 3.5 is a 12 amino acid-
54
3. Methodology
Figure 3.5: Alpha-helix structure of peptide 1ALI(derived from [1]). length synthetic peptide with amphiphilic (both hydrophilic and hydrophobic) alpha helix structure [109; 132]. It has been clarified that AASs which are related to Hydrophobicity and Hydropathy played prominent roles in the formation of alpha helices, beta strands and connecting peptides [113]. CWT-based identification of the position and protein residues responsible for the formation of alpha helix structure in 1ALI has been carried out in [109]. In this demonstration, one AAS [8] with Descriptor name KYTJ820101 as shown in Tab. 3.5 is employed. It is first used to translate the alphabetic codes of the protein residues into signals. They are further analyzed using CWT.
55
3. Methodology
Figure 3.6: Amino acids sequence and KYTJ820101-based signal of the 1ALI showing the helix structure as the shaded area. • Step 1: Conversion of the protein residues into numerical values of KYTJ820101 Scale First, the 12 amino acids sequences of 1ALI are translated into numerical sequences using KYTJ820101 scale, which is amongst the scales presented in Tab. 3.5. The amino acids sequence and the KYTJ820101-based signal of 1ALI are shown in Fig. 3.6. • Step 2: Continuous Wavelet Transform-based processing of the 1ALI Signal These KYTJ820101-based signal of 1ALI are further processed using CWT. This signal is then processed by means of CWT and the results presented as Scalogram in Fig. 3.7.
56
3. Methodology
Figure 3.7: KYTJ820101-based scalogram of the 1ALI obtained through CWTbased analysis, showing the predicted regions in the form of maximum wavelet coefficient (red) and minimum wavelet coefficient (blue). Note: An overlap of the red and blue, a characteristic of amphiphilic nature of 1AL1 resulted in green colouration between position 6 and 8. The Amino Acid Scale (ASS) used, KYTJ820101 is governed by Hydrophilicity. Hydrophobic protein structures such as helices are identified at the maximum value of wavelet coefficient when Hydrophobic AASs are engaged while the hydrophilic structures are detected at the minimum value of wavelet coefficients [9; 133]. The 1ALI, an amphiphilic synthetic peptide demonstrated both maximum and minimum value of wavelet coefficients in the scalogram (Fig. 3.7). The results obtained in this study appear to better identify the Hydrophilicity and Hydrophobic domain than in the preliminarily study [109]. Based on the results shown in Fig. 3.7, the actual position of the helix structure in the amphiphilic 1ALI lies between 3-9, while predicted region is between 4 and 10. The maximum coefficient is symbolized by the red region while the minimum coefficient is typified by blue region. This is because Hydrophobic amino acid scale is used. It has been recognized that when the Hydrophilicity-based amino acid scales are engaged, hydrophilic protein residues are identified at the maximum values of the wavelet coefficient (red regions) while the hydrophobic residues at the minimum values of the wavelet coefficient (blue region). The
57
3. Methodology
reverse is the case when Hydrophobic amino acid scales are used [9; 133]. Fig. 3.7 demonstrates a maximum-minimum-maximum wavelet coefficients, which signifies the hydrophobilic-hydrophilic-hydrophobilic nature of the amphiphilic 1ALI spanned through positions 4 and 10 in the scalogram. There is a blue region (hydrophilic) that situates between positions 6 and 8. The predicted position of the helix structure appears to agree with the actual position considering the fact that preliminary studies have shown that when the connecting peptide is very short, there may be one or two protein residue interval between the actual peptide and the predicted peptides [9; 133]. In addition, the minimum wavelet coefficients (blue spots), which represent the hydrophilic region is found in-between the two maximum wavelet coefficients(red spots) that symbolizes hydrophobilic region. Finally, it illustrates the applicability of the CWT approach in identifying protein secondary structures using amphiphilic peptide called 1ALI. This is to demonstrate how this methodology will be applied in this research in the detection of connecting peptide of the HIV gp41 core protein and the 1DF5.
3.3
Investigations
Having shown how these methodologies are used to assess protein bio-functionalities, and the CWT approach already demonstrated using an amphiphilic peptide 1AL1, other procedures are demonstrated using other peptides. The methodologies demonstrated by investigating into: 1. Comparison of the Potencies of starter materials for vaccine design: Plasmodial Peptides P18 and P32. 2. Comparison of the efficacies of drugs: HIV Fusion Inhibitors (Enfurvitude and Sifurvitude. 3. Effect of Mutations on the pharmacological activities of Drugs.
58
3. Methodology
3.3.1
Comparison of the Potencies of starter materials for vaccine design: Plasmodial Peptides P18 and P32
3.3.1.1
The Experiment
In order to show how these methods can be used in practice, amino acids sequences of the parent proteins including Plasmodial Circumsporozoite are retrieved from UNIPROT [79]. Based on the amino acids residues recorded in the literature, Peptides P18 and P32 are obtained from Plasmodial Circumsporozoite (CS) [134]. Peptides P18 and P32 shown in Tab. 3.2 have clinically been found to provide immunization against Plasmodium berghei in rodents [135]. They are unequal in length. While peptide P18 has 18 amino acids components, peptide P32 has 32 residues. 1. Bio-specificity Using the EIIP scale which is concerned with bio-recognition and bio-attachment (specificity of the antibody produced by the peptides), RRM procedure is first demonstrated and then used to determine the degree of bio-recognition and binding interaction. One of the parameters for measuring the potency of a vaccine is by determining its ability to accurately bind to the antigen (specificity). This is therefore computationally carried out using this procedure. Step 1: Conversion of the protein residues of the peptides into signal using EIIP values Then, the Alphabetic codes in the sequences of the peptides P18 and P32 demonstrated in Tab. 3.2 are interchanged with the corresponding EIIP values given in Tab. 3.5 in order to obtain their numerical sequences (signals) as displayed in Fig.s 3.8 and 3.9. Because they are now signals, the peptides can be analyzed using Discrete Fourier Transform. Step 2: Zero-padding or Up-sampling As noted in Tab. 3.2, peptide P18 is shorter than P32 by 14 amino acids. Therefore, 14 zeros are added to P18 so as to bring them to same window length of 32.
59
3. Methodology
Figure 3.8: (A) Plasmodial Peptide P18: sequence and EIIP-based encoded values, and, (B) EIIP-based signal of P18, which is numerical sequence that is processed using DFT. Step 3: Discrete Fourier Transform (DFT) Thereafter, the numerical signals of Peptides P18 and P32 are decomposed and processed by means of Discrete Fourier Transform (DFT). The outcome of the Discrete Fourier Transform decomposition yields 32 sets of imaginary, real and absolute values called SC which graphically describe the bio-recognition and bio-adhesion characteristics of the protein residues and peptides. Step 4: Cross-Spectral (CS) Analysis Cross-Spectral (CS) analysis represents the point-wise multiplication of the SC. There are no protein residues at positions 19 to 32 of the P18 and as such, they are zero-padded. The plot of the SC of P18 and P32 are shown in Fig. 3.10
60
3. Methodology
Figure 3.9: (A) Plasmodial Peptide P32: sequence, and EIIP-based encoded values, and (B) EIIP-based signal of P32, which is numerical sequence that is processed using DFT. while their CS Characteristics is displayed in Fig. 3.11. Step 5: Analysis of the results In order to obtain the point of interaction for Bio-specificity between peptides P18 and P32, Eq. 3.10 is used. As shown in Fig. 3.11, PP is at position 12 and the longest sequence is P32, which has 32 protein residues. Therefore, the CF can be derived as: CF =
PP 12 = = 0.375 N 32
61
(3.12)
3. Methodology
Table 3.5: Numerical values of amino acid scales: CORJ870101 and KYTJ820101. Amino Acid CHAM830107 KYTJ820101 A
0.00
1.8
R
0.00
-4.5
N
1.00
-3.5
D
1.00
-3.5
C
0.00
2.5
Q
0.00
-3.5
E
1.00
-3.5
G
1.00
-0.4
H
0.00
-3.2
I
0.00
4.5
L
0.00
3.8
K
0.00
-3.9
M
0.00
1.9
F
0.00
2.8
P
0.00
-1.6
S
0.00
-0.8
T
0.00
-0.7
W
0.00
-0.9
Y
0.00
-1.3
V
0.00
4.2
In this application, the CF for the two Plasmodial peptides P18 and P32 is 0.375, which is at position 12. The results obtained in this experiment, in combination with that from the sensitivity of Plasmodial Peptides P18 and P32 as a starter materials for vaccine design are discussed in section 3.3.1.2. 2: Bio-Sensitivity This involves the Informational Spectrum Method (ISM). As earlier shown the Bio-Specificity assesses the inter-molecular interaction between the Plasmodial and host peptides involved in the production of the immunizing antibody using
62
3. Methodology
Figure 3.10: Results of the Spectral characteristics of (A) Peptide, P18 and (B) Peptide, P32 showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. The Figure demonstrates amplitude of 1.00 at position 18 RRM. On the other hand, Bio-Sensitivity evaluates the intra-molecular interaction between the contents of the peptides. Like in the case of RRM, ISM procedure starts with the conversion of the alphabetic codes of the P18 and P32 peptide residues (Tab. 3.2) into numerical sequences using one of the AASs that governs the interaction. Interaction between these plasomdial peptides and their target protein Glycosaminoglycan (GAG) are found to involve negatively charged carboxyl group [136]. As a result, five charges transfer-based AASs are engaged. They are shown in Tab. 3.5. They have descriptor names CHAM830107, CHAM830108, FAUJ880111, FAUJ880112 and KLEP840101 and are retrieved from [137]. Using a charge transfer-based AAS, CHAM830107 AAS shown in Tab. 3.5 is used, two signals are obtained for peptides P18 and P32, which are further are processed by means of Discrete Fourier Transform to achieve Informational Spectra (IS) as shown in Fig. 3.12. Point-wise multiplication of the IS of both peptides, P18 and P32 yields a Common Informational Spectrum (CIS) shown in Fig. 3.13, which demonstrates a CF at position 3.
63
3. Methodology
Figure 3.11: Results of the Cross-spectral characteristic of Peptides P18 and P32 showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. The Figures demonstrate amplitude of 1.00 at position 18
Figure 3.12: Results of the CHAM830107-based Informational Spectra (IS) of (A) P18 and (B) P32, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis.
64
3. Methodology
Figure 3.13: Results of the CHAM830107-based Common Informational Spectrum (CIS) of P18 and P32, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. Using Eq. 3.10, the CF of P18 and P32 is obtained as: CF =
3 PP = = 0.094 N 32
(3.13)
From this result, the CF of the two Plasmodial peptides based on CHAM830107 is 0.094. Using the amplitudes of the protein residues of P18 and P32 at the CF, their CHAM830107-based Bio-Sensitivity of the two peptide is studied. Total Bio-Sensitivity of the two peptides is therefore derived by engaging all the AASs involved. The results are as presented in Tab. 3.6 and discussed in section 3.4.1. 3.3.1.2
Results and Discussions
The results of the methods used to analyze P18 and P32 have been given in details. However, in order to show how they relate to the biological concept, preliminary clinical studies of these proteins are first discussed. This is followed by the results obtained by using Bioinformatics methods based on Fourier Transform-oriented Signal Processing technique.
65
3. Methodology
Table 3.6: Percentage Predicted biological functionalities of P18 and P32, showing input by each amino acid scale and the positions of interaction. S/No Scale P18 P32 CF A Specificity 1 EIIP 100% 100% 0.375 B Sensitivity 2 CHAM830107 100% 89.80% 0.094(position 3) 3 CHAM830108 100% 81.10% 0.438(position 14) 4 FAUJ880111 100% 100% 0.032(position 1) 5 FAUJ880112 71.6% 100% 0.156(position 5) 6 KLEP840101 88.80% 100% 0.094(position 3) 6 Average 93.40% 95.15% 0.094(position 3) Peptides P18 and P32 from the Plasmodium falciparum have been identified to inhibit Plasmodium berghei invasion of Hep-G2 while P32 is found to protect immunized mice [135]. Interaction between the CS and the hepatocytes and subsequent invasion of the Hep-G2 by the Plasmodial sporozoites have been recognized [138; 139]. Negatively charged carboxyl group of the GAGs is found to partake in the binding of the CS to HSGP [136]. As a result, AASs that engage charges (positive and negative) are employed in this study. The interactions involved here relates to the sensitivity of the vaccine. Prior to this interaction, there is a bio-recognition and bio-attachment. This accounts for the specificity of the vaccine. The result of the EIIP demonstrated maximum and equi-potency in the two Plasmodial peptides P18 and P32. As shown in Fig. 3.10, both P18 and P32 have maximum amplitudes of 1.00 at the CF, which is at Position 12. This appears to indicate 100% bio-recognition and bio-attachment for the two peptides (P18 and P32). Though these are peptides that are still being studied for possible engagement in the designing of vaccines, the outcome of this study appears to suggest that both peptides achieved highest level of specificity and as a result, may be considered as suitable starter materials. In the case of sensitivity, the five charges transfer-based AASs are engaged. They are shown in Tab. 3.5. They have descriptor names CHAM830107, CHAM830108, FAUJ880111, FAUJ880112 and KLEP840101 and are retrieved from [137]. The results are as presented in Fig. 3.14. FAUJ880111-based study demonstrated maximum (100%) interaction in both
66
3. Methodology
Figure 3.14: Predicted efficacy of P18 (blue) and P32 (orange) as a starter material for Malaria vaccine, showing the comparison between the biological input by each amino acid scale in the designing of the Malaria vaccine. peptides, P18 and P32. Other scales disclose unequal potencies for the peptides. Though the CHAM830107 and CHAM830108 reveal higher and maximum (100%) interaction for the P18, they both show lower interaction for the P32. CHAM830107 scale yields 89.8% interaction while the CHAM830108 produces 81.1% interaction for P32. On the other hand, the FAUJ880112 and KLEP840101-based analyses disclose higher and maximum (100%) for the P32. They both yield lower interaction when P18 was used. FAUJ880112 scale revealed 71.6% interaction while KLEP84010-based possesses 88.8% for the P18. As shown in Figure fig:P18andP32predicted, both peptides demonstrated 100% bio-specificity using EIIP scale, the average bio-activity of P18 is 93.40% and P32 is 95.15%. This signifies that both peptides are suitable for Malaria vaccine design. These results suggest different sensitivity and as such, different capability for the antibody generated in immunizing host organisms for Malaria. P32 seems to be more potent though they both achieved equi-potency in terms of specificity. Based on this investigation, it can be understood that determining biological functionalities by means of DSP techniques appears to be beneficial in designing and developing drugs and vaccines. The introduction of Reverse Vaccinology in the designing of vaccines has resulted in the use of protein fragments or peptides as starter materials for vaccine design. Determining the biological characteristics of these peptides rather than obtaining results through clinical experimentation
67
3. Methodology
remains a more rational approach. Two DSP techniques namely RRM and Informational Spectrum Model (ISM) are employed in determining the biological behaviours of two peptides (P18 and P31) which are being investigated for possible use as starter materials for the designing of anti-Malaria vaccines. The results presented in this chapter revealed that both P18 and P32 share maximum affinity (100%) which seems to suggest that they offer high specificity that could result in the production of appropriate antibody, hence potent vaccines. Other interactions studied by means of the AASs engaged which relate to potency (sensitivity and neutralization power) of the antibody produced are found to be high.
3.3.2
Comparison of the efficacies of drugs: HIV Fusion Inhibitors (Enfurvitude and Sifurvitude
3.3.2.1
The Experiment
Both RRM and ISM are used to study the Pharmacological activities of two antiHIV/AIDS agents that act as Fusion Inhibitors in order compare their potencies. They are Enfurvitude and Sifurvitude. Enfurvitude is already approved for used by the FDA and it has been one of the mainstay therapy in the HIV/AIDS management. Sifurvitude is still undergoing clinical trials though it has been claimed to be efficacious. RRM is engaged to assess bio-recognition and bioattachment (inter-molecular interaction) using EIIP, while ISM is used as well as all the AASs that control the mechanisms of the action of the drugs are engaged in order to evaluate the contributions of the intra-molecular interactions. RRM and ISM procedures are therefore, both engaged. The results and discussions are presented in section 3.4.2.2 Table 3.7: Prototypic peptide of Sifuvirtide. Peptide Identity Peptide T20
YTSLIHSLIEESQNQQEKNEQELLELDKWASLWNWF
SFT Prototypic Peptide WIEWEREISNYTNQIYEILTESQNQQDRNEKDLLE
68
3. Methodology
Table 3.8: Percentage Predicted biological functionalities of Enfuvirtide, Sifuvirtide and NHR, showing the contribution of each amino acid scale in the pharmacological activities of the drugs as well as their positions of interaction. S/No Scale Enfuvirtide Sifuvirtide NHR CF A Specificity 1 EIIP 0.815 (81.5%) 1.000 (100%) 0.778(77.8%) 0.281 (position B Sensitivity 2 BURA740101 0.467 (46.7%) 0.639 (63.9%) 0.479(47.9%) 0.562 (position 3 PONP800104 0.745 (74.5%) 0.764 (76.4%) 1.000(100%) 0.283 (position 4 PRAM900102 0.648 (64.8%) 0.785(78.5%) 0.821(82.1%) 0.283 (position 5 ARGP820101 0.810 (81.0%) 1.000(100%) 0.915(91.5%) 0.283 (position 6 ENGD860101 0.637(63.7%) 0.851(85.1%) 0.907(90.7%) 0.283 (position 7 FASG890101 0.841(84.1%) 0.885(88.5%) 1.000(100%) 0.283 (position 8 JURD980101 0.828(82.8%) 0.971(97.1%) 0.986 (98.6%) 0.283 (position 9 WOLR790101 0.800(80.0%) 0.795(79.5%) 0.950 (95.0%) 0.283 (position Total 72.33% 85.42% 87.07% 3.3.2.2
Results and Discussions
Preliminary clinical studies on the two HIV Inhibitors Enfuvirtide and Sifuvirtide are first carried in order to appropriately choose the AASs to be engaged. It also helps relate our computationally derived results with the initial clinical findings. Sifuvirtide, a product of biomedical engineering is claimed to be more potent, highly effective against T20 resistant strains, safer and better tolerated than the T20 [68]. Prototypic Peptide, as shown in Tab. 3.7 is obtained from gp41 of HIV1 subtype E. By means of biomedical engineering approaches, it was re-designed to obtain Sifuvirtide by introducing salt bridge. This led to increased Heliclity, stability of the Six Helix Bundle (coiled-coil) formed by the anti-HIV/AIDS peptides and the target protein, N-terminus Hepad Repeat (NHR) which determines the anti-HIV/AIDS potency [140; 141]. Biomedical engineering design of the Sifuvirtide was achieved by the introduction of the charged amino acids namely, glutamic acid and lysine. In addition to provide Hydrophobicity pocket, Glutamic acid at position 119 was replaced with Threonine [68]. Serine was also added to the N-terminus so as to increase its stability. The biomedical engineering of the of a peptide called Prototypic Peptide 7 into the Sifurvitude resulted in the alteration of the helicity and hydrophobicity
69
9) 18) 13) 13) 13) 13) 13) 13) 13)
3. Methodology
properties. Therefore, three Alpha Helix-related AASs namely, BURA740101, PONP800104, and PRAM900102 are engaged in this study. Additionally, five Hydrophobicity-based AASs including ARGP820101, ENGD860101, FASG890101, JURD980101 and WOLR790101 are employed in studying the pharmacological activity of the two Fusion inhibitors (anti-HIV/AIDS agents). Based on the result of these preliminary clinical studies therefore, 8 Helicity-oriented and Hydrophobicity-based AASs are considered. The target protein, N-terminus Heptad Repeat (NHR) is also investigated to show its contribution in the interaction. The outcomes of the entire assessment including bio-recognition, bio-affinity, which uses EIIP, and other interactions computational investigations are as shown in Tab. 3.8 using eight AASs.
Figure 3.15: Predicted Potency of anti-HIV/AIDS drugs: Enfuvirtide (red) and Sifuvirtide (pink), demonstrating the comparison in the pharmacological activities of the two anti-retroviral agents. As shown in Tab. 3.8, the outcome of the CIS of the Enfuvirtide, Sifuvirtide and NHR from by eight AASs reveals a CF at 0.283 (position 13), while those of the Hydrophobicity-based scale PONP800104 and Alpha Helix-base ARGP820101 are atsame position. However, for the RRM, which uses EIIP, the CS analysis reveals a CF at 0.281 (position 9) while CF of BURA740101 is at 0.562 (position 18). The amplitude values for the SC at the CF for the Sifuvirtide using EIIP is 1.00, which appears to suggest 100% affinity while that of the Enfuvirtide demonstrated 81.5% affinity. As shown in Fig. 3.15, Sifuvirtide is found to demonstrate higher amplitude values in all other scales than the Enfuvirtide. The results derived by means of all the ASSs involved are as shown in Tab. 3.8.
70
3. Methodology
While the Sifuvirtide displayed an average Pharmacological activity of 85.42%, Enfuvirtide has 72.33%. This appears to concur with the preliminary clinical finding, which has identified Sifuvirtide to have exhibited better efficacy than the Enfuvirtide [68]. Preliminary Biophysical findings have revealed that Sifuvirtide has six fold higher HIV fusion inhibitory activity. It has been shown that Sifuvirtide not only forms Six Helix Bundle (SHB) with target protein (NHR) but blocks other peptides. This is unlike Enfuvirtide. These results were further authenticated by means of CD spectroscopy which indicated 93% alpha helical content for Sifuvirtide only. These factors may have been responsible for the claim that Sifuvirtide is more efficacious as it provides more interaction with the NHR which are known to possess Hydrophobic pockets. The initial findings also appear to agree with the calculated biological functionalities carried out in this study using AASs that relate to the physiologic indices employed in the clinical experiments. The indices are governed by Helicity and Hydrophobicity. Ultimately, DSP approach to determining biological functionalities results in numerically expressing the degree of interactions and intractions between proteins involved. It is a rational approach that aids in the comparison of pharmacological activities of the peptide-based drugs such as Enfuvirtide (T20) and Sifuvirtide. The two techniques namely RRM and Informational Spectrum Model (ISM) are used in this investigation to compare the potencies Enfuvirtide (T20) and Sifuvirtide in order to assess claim that Sifuvirtide is more efficacious than the Enfuvirtide. The results presented in this investigation disclosed that Sifuvirtide has higher pharmacological activities than the Enfuvirtide using 8 AASs which are associated with the physiologic indices clinically examined. This therefore suggests that Sifuvirtide is more efficacious. These results are not interpreted in terms of Pharmacokinetics and Pharmacodynamic activities like solubility, absorption, shelf-live, toxicity, distribution, excretion and safety. As a result, the therapeutic index is not given. This is because, the method engaged is not designed for investigations into toxicity and safety.
71
3. Methodology
3.3.3
Effect of Mutations on the pharmacological activities of Drugs
3.3.3.1
The Experiment
Plasmodial target peptides nDARC and its Mutant nDARC Y41F is used to study the effect of mutations on the pharmacological activities of drugs. Sequence information including mutations has been found to help assess resistance arising from exposure of drug to their target proteins. This is investigated in this research using five anti-retroviral agents. As a result, the methodology taken is first demonstrated using Plasmodial target peptides nDARC and Mutant nDARC Y41F. Preliminary studies on the mode of interaction between the parent compound of the two peptides and the target proteins are first examined in order to justify the choices of the AASs. The sequences the wild (nDARC) and mutant (nDARC Y41F) strains are then analyzed using RRM and ISM using all the AASs that relate to the mechanism of the bio-functionalities. This is in order to assess the effect of one point mutation on the activity of the peptide and its mutant. The results are discussed in section 3.4.3.2. Table 3.9: The nDARC peptide and mutant nDARC Y41F engaged in the study of effect of mutations on the pharmacological activities of drugs. Peptide Identity Peptide nDARC
AELSPSTENSSQLDFEDVWNSSYGVNDSFPDGDYD
nDARC Y41F
AELSPSTENSSQLDFEDVWNSSYGVNDSFPDGDFD
3.3.3.2
Results and Discussions
Sequence information including mutations has been found to be helpful in assessing drug resistance arising from exposure of drugs to their target sites. This is demonstrated in this research using five anti-HIV/AIDS drugs (Chapter 6). As a result, the methodology taken is first demonstrated using Plasmodial target peptides nDARC Peptide and Mutant nDARC Y41F. They are first clinically studied in order to justified the choices of the AASs engaged.
72
3. Methodology
Figure 3.16: The results of the Spectral characteristics of (A) nDARC and (B) nDARC mutant Y41F using EIIP, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis.
Figure 3.17: Cross-spectral features of nDARC and mutant nDARC Y41F, showing the amplitude (biological information) on the y-axis and the positions or frequencies of biological activities on the x-axis. Clinically, the 35 protein residues designated nDARC obtained from the Nterminal 60 protein residues of the Plasmodial target peptides, Duffy Antigen
73
3. Methodology
Receptor Chemokines (DARC) have been identified to block the binding interaction with the Duffy-Binding Proteins (DBP) of the Plasmodium vivax and Plasmodium knowlesi [142]. The nDARC Peptide and Mutant nDARC Y41F are as shown in Tab. 3.9. It has also been recognized that the exchange of the Tyrosine at position 41 with the Phenylalanine abrogates nDARC inhibitory activity. Moreover, Sulphonation of the Tyrosine at position 41 was also identified to be vital in observed increase in the binding interaction that exist between the mutated peptide nDARC Y41F and the target protein [142]. Table 3.10: Percentage Predicted biological functionalities of nDARC and nDARC Y41F, showing contribution by each amino acid scale, and their positions of interaction. S/No Scale nDARC nDARC Y41F CF A Specificity 1 EIIP 84.4% 100.0% 0.17(position 6) B Sensitivity 2 CHAM830107 100.0% 100.0% 0.17(position 6) 3 CHAM830108 100.0% 100.0% 0.29(position 10) 4 FAUJ880112 100.0% 100.0% 0.06(position 2) 5 KLEP840101 100.0% 100.0% 0.06(position 2) 5 Average 96.88% !00% One major effect of Tyrosine sulphonation is that it is a source of negative charge [143]. As a result, AASs governing bio-recognition and binding characteristics (EIIP) and other biological functionalities including charge transfer, negative and positive charges are employed in calculating the differences in the biological functionality of the mutant nDARC Y41F and correlated with the clinical findings. The AASs used are CHAM830107 and CHAM830108. They are based on charge transfer. In addition, FAUJ880112, which is also negative charge-based, and KLEP840101 that is governed by Net charges are engaged. It has also been identified that exchange of the Tyrosine at position 41 with the Phenylalanine abrogates nDARC inhibitory activity [142]. The overall results obtained in this study shown in Tab. 3.10 and Fig. 3.16, reveal that the mutant nDARC Y41F has more affinity for the target protein (100%) than parent peptide nDARC (84.4% ). This is based on the study using an amino acid scale called EIIP. Results of the remaining four scales as demonstrated 100% interaction is
74
3. Methodology
achieved by all the scales that engage charge transfer and positive charges (Tab. 3.10). This mutation is reported to have abrogated the inhibitory activity of the parent peptide, nDARC [142]. This result may have explained the preliminary clinical finding. Therefore, it demonstrated that sequence information including mutations in the protein residues can be engaged to assess biological activities of proteins and peptides and when this is applied to the drug target proteins, it could be used to assess drug resistance. The abrogation of the inhibitory activity of the Duffy Antigen Receptor Chemokines (DARC) peptide by an exchange of the Tyrosine (Y) at the 41 position by Phenylalanine (F), resulting in a mutant, nDARC Y41F is used as a case study. This study therefore appears to demonstrate that determining the biological functionalities is an easier approach to comparing pharmacological activities of drugs. It also helps determine the biological activities of peptide components of drugs and vaccines. Manipulation of the amino acids sequences for optimal biological activities can better be simplified when their biological functionalities are known, and better, delivered in numerical terms. ————————————————————————
75
Chapter 4 Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS 4.1
Summary:
In this chapter, Resonant Recognition Model (RRM) is used to study the effect of the differences in the affinity that exist between the Surface Protein, also called Glycoprotein120 (gp120) of the Human Immunodeficiency Virus/Simian Immunodeficiency Virus (HIV/SIV) isolates and their target protein, the Cluster of Differentiation4 (CD4). This is in order to determine how HIV transforms into AIDS. The study is conducted on four categories of HIV/SIV isolates and their target protein, the CD4. A version of this investigation is published as 1 The mechanism by which HIV infection transforms into AIDS disease is unclear. Several factors such as decline in immune response, increase in replication rate, syncytium inducing capacity and ability of the viruses to infect tumour cell line are associated with HIV progression to AIDS. What has not been investi1
Nwankwo N, Seker H. 2012. ”HIV Progression to AIDS: A Bioinformatic Approach to Determining the Mechanism of Action”. Current HIV Research Journal.
76
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
gated is the part played by the mutation-induced increased affinity for the CD4+ T cells by the HIV-1 T cell lymphocyte loving (T-tropic) viruses. They are known to dominate the late stage of the HIV infection in the disease progression. To examine this role, Resonant Recognition Model (RRM) is engaged in order to compare the degree of affinity between the host CD4 and the gp120 from the HIV-1 Macrophage tropics (M-tropic) viruses, HIV-1 T-tropic viruses, as well as the isolates of HIV-2 and Simian immunodeficiency Virus (SIV). The results reveal that only HIV-1 T-tropic viruses bind effectively to the CD4 suggesting that T-tropic viruses, which are identified to have mutated from the M-tropic viruses, acquire enhanced and long-lasting attachment to the CD4. This sustained affinity brings about continued attack on the diminishing CD4, until the immune system of the host collapses, which manifests clinically as AIDS. The findings suggest an approach that should target Variable region 3 (V3) of the HIV gp120 at the early stage of the infection as part of the HIV/AIDS management procedure. This treatment procedure is essential as early initiation of HIV/AIDS therapy is generally assumed to prevent the spread of the virus and deterioration of immunity. The study is expected to bring better understanding to the HIV pathogenesis and re-strategize pharmaceutical approaches to designing new HIV/AIDS therapeutic interventions.
4.2
Introduction
Acquired Immune Deficiency Syndrome (AIDS) is an array of clinical signs and symptoms of immune incompetence arising from invasion and erosion of the host Cluster of Differentiation4 + T cells (CD4+ T cells) by the Human Immunodeficiency Virus (HIV), and subsequent collapse of the immune system [144; 145]. CD4+ T cells (herewith referred to as CD4) is the principal point of contact and attack by the HIV, and as a result, CD4 destruction and depletion have been noted to be one of the major outcomes of the HIV attack [146; 147]. It is found to be one of the prognostic biomarkers for assessing HIV/AIDS progression [148], efficacy of therapies and when to initiate its treatment [27]. Advancement of the HIV infection to AIDS has been recognized as a period of explosion of progressive HIV viral population and CD4 diminution [149]. Based
77
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
on their physiological characteristics, the HIV variants (quasispecies) amongst the viral population are categorized into various groups that include CXCR4 and CCR5 depending on their co-receptor usage. Based on the capability to form multinucleated transfer cell, HIV are also grouped into Syncytium-Inducing (SI) and Non-Syncytium-Inducing (NSI) [150]. In addition, in terms of Tropic association, they are classified into T-tropic and M-tropic viruses. The three phases of HIV progression to AIDS consists of early asymptomatic, intermediate and late stages [6]. The early stage of the HIV infection has been found to be dominated by M-tropic viruses while the late stage is known to be populated by the T-tropic viruses [6; 151]. In order to help better understand the progression from early to the late stages, it is illustrated in Fig. 4.1. Acquisition of mutations by the Variable region 3 (V3) domain of the HIV1 gp120 has been reported and linked to progression of HIV infection to AIDS [152] - [154]. Clinical studies have also shown that HIV progression to AIDS has been associated with the changes in co-receptors usage, tropism and phenotype [154; 155], immune activation level [156], replication rate, ability to induce multinucleated bodies formation (referred to as Syncytium-Inducing (SI) capacity) and the ability of the viruses to infect tumour cell lines [148]. In addition, it has clinically been shown that the frequency of the SI formation correlates with the CD4 depletion rate [157]. The V3 domain of the HIV gp120 has been identified as the conserved, immune-dominant region responsible for exchanges in tropism and phenotype [152] - [154; 155]. Amino acids variations in the V3 region of the HIV gp120 from both M- and T-tropic viruses have also been characterized [6; 155]. It is noted that the T-tropic viruses have more alterations in the amino acids compositions than the M-tropic viruses. As demonstrated in Fig. 4.2, based on V3 consensus sequence determined by LaRosa et al [2], the M-tropic viruses such as Yu-2 has two mutations while BAL has one. Both isolates have two deletions. Fig. 4.2 also shows that HIV-1 T-tropic virus such as SF33 strains harbour as much as ten mutations and two deletions while HB8 isolate has seven mutations and one deletion. In the SF162, two point mutations are found to be responsible for the transformation of the HIV-1 SF162 M-tropic to T-tropic. By this property, SF162 is
78
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Figure 4.1: Schematic representation of the three main stages of the HIV progression to AIDS showing the diminished M-tropic viral population and predominance of T-tropic viruses. referred to as Dual-tropic. The mutations are I276R and A282V [158]. In order to examine and quantify affinities of each group for the CD4, the sequences of HIV-1 SF162 M-tropic and T-tropic viruses are extracted and investigated in this study (see Fig. 4.2). It has been reported that HIV infection progression to AIDS has been linked to the exchanges in co-receptors usage, tropism and phenotype [154; 155], which are associated with mutations in the V3 domain of the gp120 [152] - [154]. However, the mechanism still remains unknown [145; 152; 153; 156; 159]. In essence, there does not seem to have any study carried out on how the amino acids changes in the V3 domain of the HIV gp120 lead to AIDS disease. This study is therefore
79
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Figure 4.2: Amino acids sequences of the V3 domain of HIV isolates showing mutations in elongated alphabets, and SF162 Dual-tropism. The symbols * and + represent same residue and deletion, respectively. The consensus sequence is derived from [2]. designed to demonstrate how HIV infection translates to AIDS disease using a Bioinformatic technique called Resonant Recognition Model (RRM). The RRM has successfully been applied to various problems in biomedicine [35], which are associated with diseases like HIV/AIDS [35; 43; 98], cancer [35; 43], diabetes [35; 115] and heat shock responses that is linked to Heat Shock Protein (HSP) [97]. RRM utilizes Electron-Ion Interaction Pseudo-Potential (EIIP) and translates the amino acids residues of the proteins into numerical sequences (signals). These signals are further processed using the Discrete Fourier Transform (DFT). Point-wise multiplication of these processed signals reveals a common frequency that symbolizes the position of common biological characteristics [35; 99]. This is called the Consensus Frequency (CF). According to RRM principle, proteins
80
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
which bio-recognize and bind are known to share same CFs [35]. In this study, bio-recognition and binding interaction that exist between the host CD4 and the HIV-1 T-tropic, HIV-1 M-tropic, HIV-2 and SIV viruses are studied using RRM. This is carried out in order to firstly determine their level of affinity for the CD4, secondly identify the class with maximum affinity for the CD4, and thirdly, determine the influence of the degrees of affinity of the HIV/SIV gp120 to the host CD4 on the progression HIV infection to AIDS. The method proposed in this study helped determine degree of affinity and also the results obtained reveal that only the HIV-1 T-tropic viruses appear to bind and attach firmly to the CD4 as they a share similar CF with that of the CD4. The rest of the Chapter is organized as follows; the methodology employed and the results obtained are presented in sections 2 and 3, respectively, and discussions of the results are given in section 4.
4.3 4.3.1
Materials and Methods: Materials
Preliminary clinical experiments involving CD4 from HIV host organisms and the gp120 from HIV and SIV isolates first studied. This is to provide understanding of the relationships that exist between them and further classify the into four groups.. Amino acids sequences of CD4 from twenty-four HIV host organisms and the gp120 from forty-seven HIV and SIV isolates are retrieved from the UNIPROT [79] with accession numbers as summarized in Tab.s 4.1 - 4.3. Based on the preliminary clinical studies as shown in these tables, the forty-seven HIV and SIV isolates are first categorized into twenty-two HIV-1 T-tropic and fifteen HIV1 M-tropic viruses, five HIV-2 and five SIV isolates as listed. These sequences retrieved from the database are analyzed using RRM as illustrated in Figure 4.3.
81
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Table 4.1: HIV-1 T-Tropic isolates and their protein identities. Uniprot Accession No Isolate/Strain Type Group Subtype Reference P05877
MN
1
M
B
[6; 160]
P18799
NDK
1
M
D
[161]
P03378
ARV2/SF2
1
M
B
[162; 163]
P04578
HXB2
1
M
B
[162; 164]
P04581
ELI
1
M
B
[160; 165]
P19549
SF33
1
M
B
[160]
P03377
BRU/LAI
1
M
B
[166]
P31872
WMJ1
1
M
B
[167]
P19551
MFA
1
M
B
[164; 166; 168]
P03375
BH10
1
M
B
[165; 169]
P04582
BH8
1
M
B
[151]
P05880
WMJ22
1
M
B
[170]
P04624
HXB3
1
M
B
[171]
P04580
Z6
1
M
B
[6; 155]
P05878
SC
1
M
B
[172]
P05881
Z321
1
M
B
[172]
P05879
CDC-451
1
M
B
[173; 174; 175]
P12488
BRVA
1
M
B
[176]
O89292
93BR020
1
M
F1
[177]
P20871
JRCSF
1
M
B
[160]
O70902
90CF056
1
M
H
[177]
4.4
Resonant Recognition Model (RRM)
RRM has been utilized by several researchers to analyze the physiological properties of proteins [35; 43; 98; 115]. Details of the method can be found in Chapter 3. In line with the RRM procedure, which provides a Frequency Spectrum as a plot of the spectral characteristics that represents the biological information embedded in proteins. The plot demonstrates the amplitudes or peaks (y-axis)
82
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Table 4.2: HIV-1 M-tropic isolates and their protein identities. Uniprot Accession No Isolate/Strain Type Group/Clone Subtype
Reference
O40222
ADA
1
AD8
B
[6]
P04579
RF/HAT3
1
M
B
[178]
Q2MKA9
Bal
1
1
B
[6]
Q70626
LW123
1
M
B
[164]
Q75760
JRFL
1
M
B
[6]
P05882
Z84
1
M
B
[179; 180]
O41803
92NG083
1
M
G
[181; 182]
O12164
92BR025
1
M
C
[181]
Q75008
ETH2220
1
M
C
[183]
Q9WC60
SE9280
1
M
J
[184]
Q9WC69
SE9173
1
M
J
[184]
P20888
Oyi
1
M
B
[185]
P35961
Yu-2
1
M
B
[6]
O91086
YBF30
1
N
Not Found [186]
Table 4.3: HIV-2 and SIV isolates and their protein identities. Uniprot Accession No Isolate/Strain Type Group Subtype
Reference
P32536
ST/24.1C No.
2
Not Found
A
[187]
P15831
D205
2
Not Found
B
[188]
P24105
CAM2
2
Not Found
A
[189]
P18040
Ghana-1
2
Not Found
A
[162; 190]
Q76638
UC1
2
Not Found
B
[187; 191]
Q02837
AGM gr-1
SIV
agm.gri
P08810
Mm251
SIV
mac
Not Found [193]
P17281
CPZ GAB1
SIV
cpz
Not Found [179; 194]
Q1A261
MB66
SIV
cpz
Not Found [179]
Q8AIH5
TAN1
SIV
cpz
Not Found [179; 194]
83
Not Found [192; 193]
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
as the measure of the binding interaction and the x-axis as the frequency or positions of the interaction, which is scaled to 0.5 [35]. Cross-Spectral (CS) results are known to present symmetric (mirror) images [126]. One of the ways by which the frequency domain can be represented is as a fraction of the protein residues ranging from 0 to 0.5., which is half of the symmetric (mirror) image of the frequency spectrum. This fraction lies between the DC and one-half of the length of the protein residues [88]. The vertical axis (y-axis) the plot of the spectral analysis, which represents the amplitude is scaled using the maximum value. This affords the highest value of the y-axis as 1 or 100%. The amplitudes represent the degrees of affinity. This is because the amino acid scale engaged in this study (EIIP) is a function of biorecognition and binding interaction [35]. Bio-recognition and binding interaction is a consequence of energy transfer between the protein molecules as their valence electrons spin, pull and reverberate in an electromagnetic field [35]. The computational procedure, which is illustrated in Fig. 4.3 and described in Chapter 3 is applied to the four categories of HIV and SIV isolates in order to determine the level of affinity between each class and the CD4. This is in order to proffer insight into their roles in the advancement HIV infection to AIDS. The results are presented in Section 4.5.
4.5
Results
The results obtained are presented under main properties of RRM, namely Consensus Frequency (CF), Cross-spectral Analysis and finally, Spectral features obtained through the analysis of all the protein sequences.
4.5.1
Consensus Frequency (CF):
According to the RRM procedure [35], the relationship between the CF and Peak Position (PP) can be expressed as: CF =
84
PP N
(4.1)
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Figure 4.3: Flow chart of the methodology engaged in the analysis of the HIV transition to AIDS.
85
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
where N is the number of amino acids in the protein with the longest sequence. In this study, the longest sequence amongst the gp120 of the HIV and SIV viruses is 508 while the longest sequence from the CD4 of the hosts analyzed is 482. The Peak Position (PP) for both CD4 and HIV gp120 is 18 each. For the four datasets used, the value of CF is calculated using Eq. 4.1. By this calculation, the CF of the CD4 is found to be 0.0373 and it is 0.0354 when HIV and SIV are combined. It is also observed that the HIV-1 T-tropic viruses group demonstrates CF of 0.0354 while the M-tropic shows a CF of 0.1045. The CF of the HIV-2 is 0.3714 and that of the SIV is 0.2362. These results are further used to investigate the mechanism of HIV progression to AIDS by comparing the affinity between the four classes of HIV and SIV, namely HIV-1 T-tropic and HIV-1 M-tropic, HIV-2 and SIV viruses.
4.5.2
Cross-spectral Analysis
1. The Host CD4: As indicated in Fig. 4.4 and Tab. 4.4, the CS Analysis of the CD4 consisting of amino acids sequences from the CD4 of 25 HIV hosts organisms revealed the prominent amplitude of 1.00 at its CF (position 18 which is equivalent to CF=0.0373). This outcome seems to agree with the preliminary finding [98]. Table 4.4: Binding capabilities of the group of CD4, HIV and SIV Isolates Isolate No of isolates CF Amplitude at the CF % Affinity CD4
25
0.0373 1.00
100% (to HIV)
HIV/SIV gp120
47
0.0354 1.00
100% (to CD4)
HIV-1 T-tropic
22
0.0354 1.00
100 % (to CD4)
HIV-1 M-tropic
15
0.1045 0.001916
0.1916 % (to CD4)
HIV-2
5
0.3714 0.008874
0.8874 % (to CD4)
SIV
5
0.2362 0.1308
13.08 % (to CD4)
2. The HIV and SIV gp120 combined: The RRM-based CS analysis of the gp120 belonging to the HIV-1 T-tropic
86
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Figure 4.4: Cross-spectral feature of the host CD4, showing point of binding interaction with the HIV/SIV as position (x) = 18 or CF = 0.0373, and the degree of binding interaction as amplitude of 1.0000, suggesting 100% affinity. and HIV-1 M-tropic viruses, and HIV-2 and SIV isolates put together, revealed a CF of 0.0354, with amplitude of 1.0000 (Tab. 4.4). This appears to indicate that the gp120 of all the four groups interact with the host CD4 but at different levels. This is demonstrated as each group displays different amplitude at the point of contact. 3. The HIV-1 T-tropic Viruses: The CS analysis of the HIV-1 T-tropic viruses results in a CF of 0.0354 at position 18 with amplitude of 1.0000 (Fig. 4.5, Tab. 4.4). According to the RRM principle, proteins which bio-recognise and bind are known to share the same CF [35]. Therefore, our results seem to indicate that only the HIV-1 T-tropic viruses bind to the host CD4 as demonstrated by the fact that they share a similar CF with the CD4 of the host organisms.
87
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Figure 4.5: Cross-spectral feature of the HIV-1 T-tropic viruses, showing point of binding interaction with the HIV/SIV as position (x) = 18 or CF = 0.0354, and the degree of binding interaction as amplitude (y) = 1.0000). 4. The HIV-1 M-tropic Viruses: As shown in Fig. 4.6 and Tab. 4.4, the CS analysis of the HIV-1 M-tropic viruses discloses weak amplitude of 0.001916 at 0.035, which implies weak affinity. However, the analysis also shows a high amplitude of 1.0000 at position 51 or CF=0.1045, a position that is markedly different from the CF of the CD4. This symbolizes insignificant binding interaction with host CD4. 5. The HIV-2 Isolates: Insignificant amplitude of 0.008874 is observed at the point of interaction with the CD4, position 18 or CF=0.0373 (Fig. 4.7) suggesting that HIV-2 affinity with CD4 is weak. On the other hand, this analysis discloses a marked peak with amplitude of 1.0 at its CF (position 182 or CF=0.3714),
88
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Figure 4.6: Cross-spectral feature of the HIV-1 M-tropic viruses, showing weak interaction with the CD4 as the the degree of binding interaction, amplitude (y) of 0.001916 at the point of interaction with CD4, position (x) = 18 or CF = 0.0373. A maximum amplitude (y) = 1.00) at position (x) = 51 or CF = 0.1045 discloses a maximum interaction with yet, an unidentified protein. indicating dissimilar CF with that of the host target site CD4. Dormancy and reduced tendency to viral entry into the host CD4 by HIV-2 viruses, which seems to have resulted from poor interaction with the CD4, have been recognized. HIV-2 infection is also found to circumvent the use of CD4 [195; 196]. 6. The SIV Isolates: CS analysis of the SIV shows negligible amplitude of 0.1308 at the point of contact with the CD4 (Figure 4.8, Tab. 4.4). This implies weak affinity for the CD4. The CF of the SIV isolates is at position 120 or CF=0.2362 which presents a maximum amplitude of 1.0000. The CF of the SIV isolates
89
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Figure 4.7: Cross-spectral feature of the HIV-2 Isolates, showing weak interaction with the CD4 as the the degree of binding interaction, amplitude (y) of 0.008874 at the point of interaction with CD4, position (x) = 18, CF = 0.0373). A maximum amplitude (y) = 1.00) at position (x) = 182 or CF = 0.3714 reveals a maximum interaction with yet, an unidentified protein. (position 120 or CF=0.2362) which is different from the CF of the CD4 (position 18 or CF=0.0373) may signify lack of bio-recognition and binding interaction. Latency and diminished pathogencity by SIV as observed in this experiment have however been acknowledged to have arisen from weak interaction with the CD4 [195; 196].
4.5.3
Spectral Features:
1. The Cercopithecus aethiops (Green monkey): The Spectral Features of the CD4 from the host species (Tab. 4.5) including the green monkey demonstrate high amplitude of 0.7699 signifying 76.99%
90
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
Figure 4.8: Cross-spectral feature of the SIV Isolates, showing weak interaction with the CD4 as the the degree of binding interaction, amplitude (y) of 0.1308 at the point of interaction with CD4, position (x) = 18 or CF = 0.0373). . A maximum amplitude (y) = 1.00) at position (x) = 120, CF = 0.2362), suggests a maximum interaction with yet, an unidentified protein. binding at the CF of the CD4 position 18 or CF=0.0373. It is also observed that human provides 52.76% affinity for the HIV while Chimpanzee and Dog have 52.80% and 50.70%, respectively. Table 4.5: Binding capacities of the host CD4 to the HIV. CD4 Host Binding Capacity Human
52.76%
Chimpanzee
52.80%
Dog
50.70%
Green monkey
76.99%
91
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
2. The HIV-1 T tropic virus (HXB3) As shown in Fig. 4.9 and Tab. 4.6, the spectral feature of the HXB3 reveals high amplitude of 0.9655 at the point of interaction with the CD4 (position 18 or CF=0.0373), suggesting that HXB3 isolate, which is a HIV-1 T-tropic virus [171], has high affinity for the CD4.
Figure 4.9: The Spectral Characteristics of one isolate from each group, showing high amplitude of (A) 0.9655 (96.55% affinity) for HIV-1 T-tropic HXB3, and (B) low amplitude of 0.2045 and (20.45 % affinity) for HIV-1 M-tropic, YBF30 at the point of interaction with the CD4, (position (x) = 18, CF = 0.0354). This appears to demonstrate high interaction between the HIV-1 T-tropic HXB3 and CD4, weak interaction between the HIV-1 M-tropic, YBF30 and the CD4.
Table 4.6: Affinities of exemplary isolates from each class to the CD4, showing the level of interaction between one isloate from each class and the CD4 Isolate from Each Class Percentage Affinity to CD4 HXB3 (HIV-1 T-tropic)
96.55%
YBF30 (HIV-1 M-tropic)
20.45%
Ghana-1 (HIV-2)
40.45%
CPZ GAB1 (SIV)
37.82 %
92
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
3. The HIV-1 M-tropic virus (YBF30): YBF30, a HIV-1 M-tropic virus, a Macrophage loving virus which does not replicate in the T-Cells [186] displays a low amplitude value of 0.2045 at the point of interaction with the CD4 (position 18 or CF=0.0373) as demonstrated in Fig. 4.10.
Figure 4.10: The Spectral characteristics of one isolate from each group, showing both low amplitude of (A) 0.4045 (40.45% affinity) for HIV-2 Ghana-1 and (B) 0.3782 (37.82% affinity) for SIV GAB1 at the point of interaction with the CD4, (position (x) = 18, CF = 0.0354), which seems to illustrate weak interaction with the CD4. 4. The HIV-2 virus (Ghana-1): Ghana-1, a HIV-2 isolate [162; 190] discloses a weak affinity for the CD4 signifying low attraction for the CD4 by all the HIV-2 viruses. It has amplitude of 0.4045, which seems to symbolize 40.45% affinity (Fig. 4.10). 5. The SIV virus (CPZ GAB1): As shown in Fig. 4.10, CPZ GAB1, an isolate of SIV [190] has amplitude of 0.3782 at the point of affinity for the CD4 (position 18 or CF=0.0373). This symbolises weak affinity of 37.82% to CD4.
93
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
6. The Dual-tropic virus (SF162): The virus SF162 (Tab. 4.7), preliminarily identified to belong to the Dualtropic group [158] is used in this study. The Spectral characteristics of both M-tropic and T-tropic SF162 are presented in Fig. 4.11. The analysis reveals that M-tropic and T-tropic SF162 have amplitudes of 0.375 (37.5% affinity) and 0.4108 (41.08%), respectively, at the point of contact with the CD4 (position 18).
Figure 4.11: The Spectral characteristics of the HIV-1 SF162, showing amplitude of (A) 0.375 (37.50% affinity) for HIV-1 SF162 M-tropic and (B) 0.4108 (41.08% affinity) for HIV-1 SF162 T-tropic at the point of interaction with the CD4, (position (x) = 18, CF = 0.0354). This appears to suuport the fact that mutation in the M-tropics that bring about transformation into T-troipcs results in increase in affinity.
Table 4.7: HIV-1 Dual-tropic isolate and its protein identity. Uniprot Accession No Isolate/Strain Type Group Subtype Reference P19550
SF162 Dual-tropic
94
1
M
B
[6; 158; 197]
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
4.6
Discussions
It has been established that there are more mutations in the V3 region of the HIV-1 T-tropic viruses than those of the M-tropic counterparts [6]. Clinically, T-tropic viruses, which predominate during the full-blown AIDS, are not only recognized as SI, virulent and pathogenic [6; 155; 171], but are also mutants of the M-tropic viruses. Initial recognition of the point of interaction between the HIV gp120 and CD4 has been made as at the RRM spectral frequency of 0.035 [98]. This result is in accord with results obtained in this study. Based on the preliminary clinical findings, HIV and SIV isolates engaged in the studies are first categorized into four main groups. They are HIV-1 T-tropic, and M-tropic viruses, HIV-2 and SIV. The HIV-1 T-tropic and M-tropic viruses are listed in Tab.s 4.1 and 4.2, respectively, whereas list of the proteins in both HIV-2 and SIV are given in Tab. 4.3. The Consensus Frequencies (CFs) of the groups are obtained using RRM as presented in Tab. 4.4. The amplitude of each isolate is also derived using the same technique. The amplitudes of the groups and isolates at the CF depict the degrees of binding interaction with the CD4. This is because EIIP is based on bio-recognition and bio-attachment of proteins (affinity) [35; 43; 98; 115]. These results are further used to investigate the mechanism by which HIV infection advances to AIDS. Out of the four classes of HIV and SIV namely, HIV-1 T-tropic and M-tropic viruses, HIV-2 and SIV isolates, only the HIV T-tropic viruses are found to bind effectively to the host target protein (CD4). Others are found to have weak affinity for the CD4. These are demonstrated in Figs. 4.5-4.8 and Tab.s 4.1-4.5. The CF of the CD4 (Fig. 4.4) and that of the HIV-1 T-tropic viruses (Fig. 4.5) appear to be at the same frequency. Both achieved high amplitude of 1.000 at the CF which seems to suggest 100% affinity for each other. This is unlike the HIV-1 M-tropic viruses, which has affinity of 0.1916% (Fig. 4.6). HIV-2 isolates also demonstrates weak affinity for the CD4 (0.8874%) as shown in Fig. 4.7, while SIV isolates show weak attraction to the CD4 as illustrated by the 13.08% degree of affinity (Fig. 4.8). Using one isolate from each class, binding interactions that exist between these classes and the CD4 are further studied as shown in Fig.s 4.9 and 4.10. HXB3
95
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
isolate which belongs to the HIV-1 T-tropic viruses is found to have highest affinity (96.55%) for the CD4. The degrees of binding between the CD4 and isolates from the other classes are found to be low. For example, while YBF30, a HIV-1 M-tropic virus has 20.45 %, Ghana-1, a HIV-2 isolate has 40.45% and CPZ GAB1, which is of the chimpanzee stock (SIV), has 37.82%. HIV T-tropic viruses are known to dominate the late stage of HIV/AIDS [6; 151]. At this late stage, the CD4 is greatly depleted. They are also known to have more mutations than the M-tropic viruses. The M-tropic viruses have been identified to take over the viral population at the early, asymptomatic and sero-conversion stage [6]. Transformation of the M-tropic viruses into the T-tropic viruses has been associated with mutations in the V3 domain of the HIV gp120 belonging to the Mtropic viruses [152]. As HIV multiplies rapidly, it depletes the CD4+cell (CD4). In effect, CD4, which is the only source of nutrients for the HIV diminished speedily. However, HIV must survive in this CD4 scarcity and what it does in this situation was found to be responsible for transformation of HIV infection into AIDS disease. In between the early stage of the HIV infection and the late full blown phase, there is an intermediate stage [6]. We therefore propose that after the transitional phase when equilibrium appears to have been achieved by the fast multiplying HIV and the fast depleting CD4, which is the viruses’ major source of nutrient, the HIV M-tropic strains appear to undergo some mutational changes in the V3 and other domains and transform into T-tropic viruses which have more affinity for the CD4. This affinity for the CD4 empowers them to sustain their grip on the continually dwindling CD4 resulting in continued HIV replication and destruction of CD4. This eventually results in the total erosion of immune protection that clinically manifest as AIDS. The difference in the degrees of affinity between the two components of the Dual-tropic viruses and the CD4 is also demonstrated in this study using SF162. SF162 dual-tropism arises as a consequence of two point mutations (I276R and A281V) [158]. As shown in Fig. 4.11, the M-tropic strain has amplitude of 0.375, which seems to symbolise 37.5% affinity for the CD4 while the SF162T-tropic strain has amplitude of 0.4108 that also appears to suggest 41.08% affinity for
96
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
the CD4. It is therefore observed that there is about 4% increased affinity for the CD4 arising from the two point mutations in the SF162 M-tropic strain. This finding appears to support the proposed mechanism of the transformation of the viral infection (HIV) into AIDS disease as a result of enhanced affinity for CD4 due to the mutations in the M-tropic viruses that transform them into T-tropic strains. It is pertinent to note that the materials and the technique utilized in this study have been inherently validated. This is because the protein residues used are obtained from clinically studied experiments and referenced. They are not generated. In addition, the ASS engaged, EIIP has been clinically and computationally derived and deposited in the database [3; 35]. The DSP technique involved namely, RRM has been validated and employed in study of over 1000 proteins [35].
4.7
Conclusions
In this study, the degree of affinity that exists between the host CD4 and HIV1 T-tropic and M-tropic viruses, HIV-2 and SIV isolates are computationally assessed. Resonant Recognition Model (RRM), which has been shown to be a useful method in bioinformatics is engaged. This is in order to identify which group has the ability to sustain its hold on to the CD4 as it dwindles during the late stage of the infection and help define the physiological roles played by the mutation-induced switch from HIV-1 M-tropic to T-tropic viruses in the HIV transformation to AIDS. The results reveal that only the T-tropic viruses have increased and enhanced affinity for the CD4. The outcome of this study also suggests that as HIV progresses to AIDS and CD4 depletes, HIV-1 M-tropic viruses undergo a life-saving transformation into T-tropic by means of mutations in their V3 domain resulting in elevated affinity and enhanced grip on the dwindling CD4. This leads to further destruction and depletion of the CD4 and eventual collapse of the immune system that manifests itself as AIDS. Using these intrinsically validated materials and technique, the study presented in this Chapter appears to identify the role played by the HIV-1 T-tropic
97
4. Resonant Recognition Model-based Approach to Identifying the Mechanism of HIV Transformation into AIDS
viruses in the HIV translation to AIDS. HIV/AIDS therapeutic intervention has been associated with pathogenic processes [198] and therefore requires good understanding. Apart from helping in the elucidation of the mechanism of the progression of HIV to AIDS, which has been described as poorly understood, this study not only attempts to shed a light into the HIV/AIDS disease course but also help re-strategise developmental plans for the successful management of HIV/AIDS. Based on the findings, we recommend that development and subsequent application of drugs, preventive devices and vaccines that would target the V3 domain of the HIV at the earliest stage of the HIV infection be considered as another approach which should be incorporated into the HIV/AIDS management procedures. This is necessary since it has been hypothesized that early initiation of HIV/AIDS treatment preserves immune function [199; 200] and reduces transmission [201]. ————————————————————————
98
Chapter 5 Resonant Recognition-Based Characterization and Identification of HIV Tropic and Phenotypic Associations 5.1
Summary:
Using Resonant Recognition Model (RRM), tropic and phenotypic associations of the HIV and Simian Immunodeficiency Virus (SIV) are investigated and further predict amongst those with unidentified associations. This is achieved by studying the degree of affinity between these isolates and their host target protein, Cluster of Differentiation4 + T cells (CD4+ T cells). This study is necessary since there are large deposit of isolates with unknown group. In addition, knowledge of the physiological associations of micro-organisms and viruses has helped in the designing of interventions and also managing treatments. Part of the findings obtained in this research is published as: 1 In this study, 53 HIV and SIV isolates are studied in order to categorized into tropic and phenotypic associations based on clinical findings. They are fur1
Nwankwo N, Seker H. 2010. ”Assessment of the Binding Characteristics of Human Immunodeficiency Virus Type 1 Glycoprotein120 and Host Cluster of Differentiation4 Using Digital Signal Processing ”. Conf. Proc. IEEE BIBE 2010:289-290.
99
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
ther engaged in the prediction of their associations using Resonant Recognition Model (RRM) and their sequence information. This is in order to find out how this approach will help in the designing and developing therapies, therapeutic management methods and biomedical tools. The results disclose that 14 of the 19 HIV-1 T-cell lymphocyte-loving (Ttropic) viruses possess more than 50% affinity for CD4. They are also identified to have Syncytium-Inducing (SI) capacity. The outcomes also reveal that 7 out of 12 of the HIV-1 Macrophage tropics (M-tropic) viruses have less than 50% affinity for the CD4. Only one isolate is found to belong to SI phenotypic group with high affinity for the CD4. The HIV-2 and SIV are observed to demonstrate low affinity for the CD4. These results are found to agree with preliminary clinical findings. It is also disclosed in this study that most HIV-1 T-tropic viruses are SI while the HIV-1 M-tropic counterparts belong to NSI. This approach is further used to identify connection between all the isolates and host organisms. For example, it is recognized in this study that HIV isolates obtained from Gabon stock called OYI as well as an isolate from Zaire referred to as Z6 share same highest point of interaction with an American isolate called CDC-451. This tends to suggest trans-Atlantic transmission. The results obtained in this study, which computationally cluster HIV isolates into groups are expected to help in the management of not only HIV/AIDS that has been known to be refractory to treatment but others including cancer. It is also envisage that the approach engaged in this investigation will help design and develop therapies for various ailments.
5.2
Introduction
Knowledge of the patients’ Chemokine (C-C motif) Receptor status is vital in the management of HIV/AIDS disease. Therefore, its clinical assessment or computational prediction has become essential. Enhanced Sensitivity version of the Trolife Assay-based assessment of the HIV co-receptor usage has been recognized to be successful [202]. This has resulted in the application of Maraviroc, a CCR5 antagonist in the HIV/AIDS management. Based on the host C-C motif Receptor usage, HIV is divided into two main
100
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
groups. They include CCR5 and CXCR4 users. Knowledge of the HIV coreceptor usage was found to have enhanced HIV/AIDS management [202]. This therefore support the fact that understanding the categories of the viruses are vital in designing and developing interventions, and treatment plans. In this study, a Bioinformatics technique called Resonant Recognition Model (RRM) is applied to HIV and SIV in order to help categorize them into T-Cell Lymphocyte and Macrophage usage as well as their phenotypic associations. This is in order to find out if the knowledge will reveal any treatment, therapeutic strategy and other relationships between and amongst the virus and the host species, using consensus bio-functionality identification. Binding interactions between the HIV gp120 from several isolates and the host CD4 have been identified by means of Resonant Recognition Model (RRM) [35], [98], and at a spectral frequency of 0.35 [98]. Also, the domains that are responsible for the binding interaction in both HIV [203], [35] and the CD4 as amino acids positions 40-60 of the D1 [204] and 41-64 [205] have been recognized. Peptidic fragments of these regions and peptides with similar spectral characteristics have been identified and employed in the attempted design of HIV/AIDS vaccines [203], [35] including a nanopeptide (NAKTIIVOL) [58]. HIV and SIV are classified into two Tropic and Phenotypic groups, respectively. The tropic classification of the HIV consists of T-Cell Lymphocyte-loving (T-tropic) and Macrophage-loving groups. Another categorization, which is based on HIV phenotypic associations comprises of Syncytium-Inducing (SI) and Non Syncytium-Inducing (NSI). The HIV T-tropic viruses are known as virulent because they are recognized to be cytopathic (cell-damaging) [162]. They dominate the late stage of the HIV/AIDS disease [151]. On the other hand, Macrophage-loving HIV isolates are identified as less virulent [155] and dominate the viral population of the infected individuals at the early stage. Some isolates such as HIV-1 SF162 are known to possess both M-tropic and T-tropic characteristics and are therefore referred to as Dual-tropic viruses [158]. HIV-2 and SIV which are recognized clinically to be CD4-independent, latent and are referred to as non-progressor as they circumvent the gp120-CD4 binding interaction. They rather form a distinct conformation that enables contact with
101
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
a Transmembrane co-receptors [196]. Viruses, which are known to infect cells called MT cells are know to have Syncytium Inducing (SI) capacity [206]. This infection is recognized to result in the formation of large multi-nucleated cell called Syncytium [207]. Syncytium induction has been found to have no association with the HIV viral load and is therefore regarded as HIV natural characteristics called phenotype [207]. On the other hand, it has been associated with the CD4 depletion [157]. In order to find out if relationship can be established between the HIV/SIV phenotropic associations and the viral degrees of affinity to the CD4, the protein residues of the HIV gp120 of the 53 isolates are obtained from the UNIPROT database [79] and analyzed using RRM technique. The results are further used to classify the viruses and their host organisms as well as determine common relationships that exist amongst them. This is in order to find out if the biological functionalities could be targets for therapeutic interventions. It is also intended categorize these organisms and predict the association of those with unknown classes. RRM is a physio-mathematical approach [35], which has helped in the proteins such as Anthrax Protective Antigen (PA), Anthrax Toxin Receptor/Tumour Endothelial Marker 8 (ATR/TEM8) and Capillary Morphogenesis Protein 2 (CMG2) [37]; the cancer suppressor proteins p53 and pR13 that resulted in Oncophage vaccine developments [43]. It has been used in the study of Heat Shock Proteins (HSP) [97]; tumour-activating DNA virus, Simian vacuolating virus 40 or Simian virus 40 (SV40) and its enhancer [37]; hemagglutinin (HA) membrane glycoprotein of the influenza virus [43], [97]; and hormone prolactin [45]. In addition, it has helped in the discovery and development of therapies for diseases linked to these proteins. Such diseases include fearsome Swine Flu [93]], Cancer [43], [97], [44], Diabetes [45], Anthrax [37] and HIV/AIDS [35], [98]. To name other applications of RRM are investigation into Fibroblast Growth Factors (FGF) [35], [98] and antigenic drift and evolutionary road-map of Influenza virus [36]. In the subsequent sections, the method engaged, results obtained, inferences drawn from the results and conclusions reached are presented.
102
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
5.3 5.3.1
Materials and Method Materials
Previous clinical experiments carried out on CD4 from HIV host organisms and the gp120 from HIV and SIV isolates first studied. This is to provide insight into the relationships that exist between them and help categorize them based on clinical experiments. Amino acids sequences of the Cluster of Differentiation4 (CD4) from 24 host organisms and the HIV surface protein (also referred to as glycoprotein120) from 53 HIV and SIV isolates are retrieved from UNIPROT [79] with their accession numbers as summarized in Tab.s 5.1 - 5.7. The amino acids sequence of the HIV-1 SF162 T-tropic, obtained from literature [158] is constructed from the Mtropic isolates as deposited in the UNIPROT, and analyzed together with other sequences using RRM.
5.3.2
Resonant Recognition Model (RRM)
Resonant Recognition Model (RRM) is used to investigate the Tropic and Phenotypic associations of the HIV and SIV. The procedure is applied to the gp120 of 53 isolates from HIV and SIV as well as CD4 from 25 HIV host organisms in order to determine their affinity. The results are further engaged to evaluate the relationships between their levels of affinity and their Tropic and Phenotropic associations.
5.4
Results
The results of these experiments are presented in Fig.s 5.1 - 5.6 and Tab.s 5.1 5.7. They are further discussed in the subsequent sections. The relationship between the Consensus Frequency (CF) and the Peak Position (PP) has been expressed as shown in Eq. 5.1. CF =
PP N
103
(5.1)
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
where N is the length of the longest protein. The longest sequence (N) for the gp120 and the CD4 are 508 and 482 respectively. The CFs of the HIV gp120, the host CD4, and a combination of the gp120 from HIV and SIV as Multiple Cross Spectral are obtained at position 18. The length of the longest protein (N) for the gp120 and the CD4 are 508 and 482 respectively. Using Eq. 5.1, the CF for the gp120 is 0.0354 while the CD4 is 0.0373. The CF of the Multiple Cross Spectral of the gp120 from HIV and SIV is also found to be 0.0354. These results appear to be in accord with initial outcomes [98]. The results are further used to predict the physiological properties of HIV including tropism and phenotypic associations. T-Cell loving HIV-1 isolates are known to be cytopathic (virulent) and also maintain firm attachment to the virus [162; 164]. This is unlike the Macrophage loving HIV-1 and the non-progressor viruses (HIV-2 and SIV) [196]. The horizontal (y) axis, shows the amplitude and is scaled in terms of maximum value. Therefore, the maximum value obtainable from the y-axis is 1.0 or 100%. EIIP amino acid scale used in this study is a function of bio-recognition and binding interaction (affinity) [35]. As a result, the amplitudes are expressed in percentages, representing the degree of binding interaction or affinity. Based on the amplitudes, the levels of affinity between the HIV and SIV isolates and the CD4 are obtained. Tab. 5.1 is the result of the Spectral Analysis of the Host CD4 comprising of 24 species. The results of the HIV-1 isolates are divide into two groups. While Tab. 5.2 summarizes the outcome of the analysis of the HIV-1 T-tropic viruses, Tab. 5.3 sums up the results for the M-tropic viruses. In addition, the two HIV Dual-tropic viruses are shown in Tab. 5.4. The findings regarding the affinity of the HIV-1 with unknown tropism, HIV-2, and SIV, are also presented in Tab.s 5.5 - 5.7. In addition, the results of the two HIV Dual-tropic viruses are shown in Table tab:SCHIV1D. Fig.s 5.1 presents the Cross Spectral analyses of the HIV gp120, the CD4, and the Multiple Cross Spectral analysis of both HIV and SIV. Each demonstrated the CF of each analysis at position 18. As stated earlier, they present the CF of 0.0354 for the HIV/SIV gp120 0.0373 for the CD4. Both results are similar and in accord with preliminary studies [98].
104
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Table 5.1: Results of the spectral characteristics of the CD4 from the host organisms demonstrating the degree of the affinity between the HIV and the host species. MP stands for Maximum Peak and NF, Not Found. Protein Host Organism Affinity to CD4 MP Freq MP P01730
Humans
52.76%
0.141(68)
0.7235
P06332
Mouse
72.19%
0.064(31)
0.7640
P05540
Rat
77.08%
0.228(110) 0.9434
Q08338
Green monkey
76.99%
0.210(101) 0.8395
P16003
Rhesus macaque
64.20%
0.210(101) 0.7053
P16004
Chimpanzee
52.80%
0.141(68)
0.7640
P33705
Dog
50.70%
0.089(43)
0.6123
Q08336
Mangabey
55.41%
0.417(201) 0.5797
Q08339
Dancing monkey
52.30%
0.210(101) 0.6592
Q08340
Pig-tailed monkey
58.88%
0.210(101) 0.7511
P46630
Rabbit
38.95%
0.477(230) 0.7819
P79185
Crab eating monkey
66.38%
0.220(106) 0.7272
Q9XS78
Beluga whale
46.67%
0.125(60)
P79184
Japanese macaque
61.02%
0.220(106) 0.7113
Q29037
Squirrel monkey
55.70%
0.189(91)
0.6524
Q68AX6
Japanese puffer-fish
27.14%
0.151(73)
0.5433
Q68AX5
Japanese puffer-fish
29.03%
0.064(31)
0.5850
B3CJQ7
European sea bass
81.10%
0.355(171) 0.9793
B8YEL2
Domestic duck
55.65%
0.278(134) 0.7370
B8YEL3
Domestic duck
60.18%
0.278(134) 0.7178
Q90WB5
Domestic duck
65.50%
0.093(45)
0.7723
A7YY52
Bovine
29.05%
0.050(24)
0.6416
Q6R3N4
Pig
35.05%
0.017(8)
0.6171
33.08%
0.201(97)
0.6429
Q9W6V7 Chicken
0.6443
As shown in Fig. 5.2 and Tab. 5.2, HIV-1 isolate known as MFA is found to demonstrate an affinity of 100% at position 18, the point of contact with the
105
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Table 5.2: Results of the spectral characteristics of HIV-1 T-tropic viruses, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. Five isolates have maximum affinity for the CD4. MP : Maximum Peak. Protein Isolate Affinity to CD4 MP Freq MP Phenotype Ref. P03377
BRU/LAI
84.77%
0.240(122) 0.8727 SI
[166]
P03375
BH10
98.14%
0.0354(18) 0.9814 SI
[165; 169]
P04578
HXB2
92.21%
0.0354(18) 0.9221 SI
[162; 164]
P05877
MN
58.61%
0.220(102) 0.7974 SI
[6; 160]
P18799
NDK
61.41%
0.173(88)
0.7066 SI
[161]
P04582
BH8
84.28%
0.264(134) 1.0000 SI
[151]
P05880
WMJ22
49.88%
0.205(104) 0.6359 SI
[170]
P05879
CDC-451
30.00%
0.305(155) 0.6854 SI
[173; 174; 208]
P04624
HXB3
96.55%
0.0354(18) 0.9655 SI
[171]
P04580
Z6
37.30%
0.305(155) 0.7347 SI
[6; 155]
P05878
SC
67.74%
0.299(152) 0.6842 SI
[209]
P05881
Z321
62.53%
0.232(118) 0.9862 SI
[172]
P04581
ELI
45.21%
0.295(150) 0.6326 SI
[160; 165]
P31872
WMJ1
48.90%
0.299(152) 0.5042 SI
[167]
P12488
BRVA
68.30%
0.191(97)
[176]
P03378
ARV2/SF2
65.01%
0.232(118) 0.8094 SI
[162; 163]
P19551
MFA
100.00%
0.0354(18) 1.0000 SI
[164; 168]
P19549
SF33
67.13%
0.319(162) 0.7557 SI
[163]
57.91%
0.169(86)
[177; 209]
O89292 93BR020
0.7502 SI
0.8310 SI
CD4. A HIV-2 isolate, ST/24.1C#2 is observed to reveal a low affinity of 34.13% at position 18 (Table 5.6). Also, a SIV isolate which is known as MB66 is shown to display an affinity of 31.94% at the same position (Tab. 5.7). The Spectral Characteristics of the HIV isolates from OYI, Z6 and CDC-451 are demonstrated in Fig. 5.3. They present affinities of 100% (OYI), 73.47% (Z6) and 68.54% (CDC-451), at the point of maximum amplitude, 155 (f=0.305) shown in Tab. 5.3. Also, Fig. 5.4 describes the Spectral Characteristics of
106
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Table 5.3: Results of the spectral characteristics of HIV-1 M-tropic viruses, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. One isolate, LW123 has maximum affinity for the CD4. MP : Maximum Peak. Protein Isolate Affinity to CD4 MP freq MP Phenotype Ref. Q70626
LW123
82.96%
0.0354(18) 0.8296 SI
[164]
P35961
YU-2
40.48%
0.348(177) 0.7116 NSI
[6]
P04579
RF/HAT3
27.29%
0.203(103) 0.726
[178]
O41803
92NG083
28.68%
0.087(44)
0.5723 NSI
[181; 209]
Q75008
ETH2220
26.27%
0.104(53)
0.6099 NSI
[183; 210]
Q9WC60 SE9280
41.03%
0.331(168) 0.7469 NSI
[184]
Q9WC69 SE9173
57.26%
0.127(65)
0.7930 NSI
[211]
O91086
YBF30
23.03%
0.282(143) 0.5092 NSI
[209]
P05882
Z84
20.78%
0.146(74)
[179; 180]
P20871
JRCSF
66.08%
0.305(155) 0.7856 NSI
[160]
O12164
92BR025
61.84%
0.169(86)
[181]
O70902
90CF056
51.74%
0.270(137) 0.8094 NSI
NSI
0.5774 NSI 0.6892 NSI
[177; 209]
Table 5.4: Results of the spectral characteristics of HIV-1 Dual-tropic viruses, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. MP : Maximum Peak. Protein Isolate Affinity MP freq MP Phenotype Ref. P19550
SF162M
37.50%
0.440(226) 0.6040 NSI
[6; 158; 163; 197]
P19550
SF162T
41.08%
0.440(226) 0.6040 NSI
[6; 158; 163; 197]
66.28%
0.317(161) 0.8311 SI
[6; 165]
Q73372 strain 89.6
the HIV isolates from SC, 96CM-MP535 and WMJ1. These appear to signify affinities of 68.42% (SC), 54.99% (96CM-MP535) and 48.90% (WMJ1), at the point of maximum amplitude, 152 (f=0.299) (Tab.s 5.2 and 5.3). At the point of maximum amplitude, 200 (f=0.394), the Spectral Characteristics of the HIV-1 VI850 reveals affinities of 100% while SIV MB66 displays 89.86%, (Fig. 5.5). Two SIV isolates TAN1 and GAB1 demonstrated same max-
107
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Table 5.5: Results of the spectral characteristics of HIV-1 with unknown tropism, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. MP : Maximum Peak, and NF: Not Found Protein Isolate Affinity to CD4 MP freq MP Phenotype Ref. P04583
MAL
29.21%
0.295(150) 0.6445 NF
[170; 209]
P12489
JH32
35.46%
0.203(103) 0.7374 NF
[170; 209]
P20888
OYI
87.51%
0.305(155) 1.0000 NF
[170; 209]
P31819
KB-1/ETR
67.19%
0.171(87)
NF
12.36%
0.299(152) 0.5499 NF
[212]
Q9QBZ8
97ZR-EQTB11 46.19%
0.284(144) 0.8241 NF
[212]
Q9QBZ0
96CM-MP257
44.86%
0.884(45)
0.7091 NF
[212]
Q9QBZ4
96CMMP255
27.90%
0.226(115) 0.8837 NF
[212]
Q9QSQ7
VI850
35.26%
0.311(158) 0.5793 NF
[213]
P12487
Z2/CDC-Z34
37.56%
0.295(150) 0.7289 NF
[170; 209]
Q9QBY2 96CM-MP535
0.7193 NF
Table 5.6: Results of the spectral characteristics of HIV Type 2 isolates, showing the level of binding interaction that exist between each isolate and the host CD4, and the position of maximum interaction. MP : Maximum Peak. Protein Isolate Affinity MP freq MP Tropism Phenotype Ref. P32536
ST/24.1C#2 34.13%
0.152(77)
P15831
D205
40.85%
P24105
CAM2
P18040
Ghana-1
Q76638 UC1
1.0000 M
NSI
[187]
0.274(139) 1.0000 NF
NF
[188]
41.14%
0.375(191) 1.0000 NF
NF
[189]
22.49%
0.215(109) 1.0000 NF
NF
[162; 190]
42.28%
0.207(105) 1.0000 M
NSI
[187; 191]
imum point of interaction at position 200 (Fig. 5.6). Fig. 5.7 shows the Spectral Characteristics of the HIV isolates ELI, MAL and Z2/CDC-Z34 presenting affinities of 63.26% for ELI, 64.45% for MAL and 72.89% for Z2/CDC-Z34 all at the point of maximum amplitude, 150 (f=0.295) (Tab. 5.3). These results are also displayed in Tab.s 5.5 and 5.7. Fig. 5.8 shows the Spectral Characteristics of the CD4 from Human and
108
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Table 5.7: Results of the spectral characteristics of SIV isolates. MP : Maximum Peak. Protein Isolate Affinity MP freq MP Tropism Phenotype Ref. Q02837
AGM gr-1
51.42%
0.278(141) 1.0000 T
SI
[192; 193]
P08810
Mm251
37.73%
0.213(108) 1.0000 T
SI
[193]
P17281
cpz GAB1
37.82%
0.394(200) 0.8986 T
SI
[179; 194]
Q1A261
MB66
31.94%
0.311(158) 0.6926 NF
NF
[179]
Q8AIH5 TAN1
34.49%
0.394(200) 1.0000 T
SI
[179; 194]
Figure 5.1: The CS analyses of the (A) HIV gp120, (B) CD4, and (C) Multiple CS of gp120 of HIV and SIV showing common point of binding interaction, (CF) at position 18, respectively. Chimpanzee. It demonstrates affinity of 72.35% for the Human and 76.40% for Chimpanzee all at the point of maximum amplitude 68 (f=0.141) as shown in Tab. 5.1. It also reveals affinities of 52.76% for the Human and 52.80% for the Chimpanzee at the point of interaction with the CD4 (position 18 or CF=0.0373). Fig. 5.9 displays the Spectral Characteristics of the CD4 from Dancing, Green and Pig-tailed Monkeys. It shows affinity of 65.92% for Dancing monkey, 76.99% for Green monkey and 0.7511 for Pig-tailed Monkey at the point of maximum amplitude, 101 (f=0.210) (Tab. 5.1).
109
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Figure 5.2: The SC of the three groups, (A) HIV-1 MFA with the highest amplitude of 1.0; (B) HIV-2, ST/24.1C#2 with low amplitude of 0.3413; and (C) SIV, MB66 with also low amplitude of 0.3194, respectively at the point of common binding interaction, CF (position 18 or CF=0.0354).
Figure 5.3: The SC of the (A) HIV-1 OYI, (B) Z6 and (C) CDC-451 showing amplitude of 1.00, 0.7347 and 0.6854, respectively at the same point of maximum amplitude, 155 (f=0.305).
5.5
Discussion
Fifty three isolates of HIV and SIV, which have been studied using clinical approaches as shown in Tab.s 5.1 - 5.7 are analyzed by means of RRM. RRM has been validated and employed in study of over 1000 proteins [35]. In addition, the ASS engaged, EIIP has also been clinically and computationally derived and deposited in the database [3; 35]. As a result, validation of the technique and the
110
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Figure 5.4: The SC of the (A) HIV-1 SC, (B) 96CM-MP535 and (C) WMJ1 showing amplitude of 0.6842, 0.5499 and 0.4890, respectively at the same point of maximum amplitude, 152 (f=0.299).
Figure 5.5: The SC of the (A) HIV-1 VI850 and (B) SIV MB66 showing amplitude of 0.5793 and 0.6926, respectively at the same point of maximum amplitude, 158 (f=0.311). materials is acquired innately. This is in order to categorized them into tropic and phenotypic groups using their level of affinity to the CD4 derived. Of these HIV/SIV isolates, 23 have clinically been recognized as T-tropic viruses while 14 belong to the M-tropic
111
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Figure 5.6: The SC of the (A) SIV TAN1 and (B) SIV GAB1 showing amplitude of 1.0000 and 0.8986, respectively at the same point of maximum amplitude, 200 (f=0.394).
Figure 5.7: The SC of the (A) HIV ELI, (B) MAL and (C) Z2/CDC-Z34) showing amplitude of 0.6326, 0.6445 and 0.7289, respectively at the same point of maximum amplitude, 150 (f=0.295). group. There are also, 2 dual tropic viruses, SF162 and Strain 89.6 which are also analyzed. In addition, the tropic association of 10 isolates could not be identified from literature. Amongst the HIV-1 viruses, 19 are T-tropic while 12 are M-tropic viruses.
112
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Figure 5.8: The SC of the CD4 from (A) Human and (B) Chimpanzee showing amplitude of 0.7235 and 0.7640 at the same point of maximum amplitude, 68 (f=0.141).
Figure 5.9: The SC of the CD4 from (A) Dancing Monkey, (B) Green Monkey and (C) Pig-tailed Monkey showing amplitude of 0.6592, 0.7699 and 0.7511, respectively at the same point of maximum amplitude, 101 (f=0.210). Their degrees of affinity with the CD4 engaged by HIV-1 isolates are computationally obtained and used to categorize them into tropic as well as phenotypic groups. The HIV-1 T-tropic viruses are known to be virulent [168] while the M-tropic counterparts are recognized to be less destructive to the CD4 [196].
113
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
First, the CF of the HIV/SIV and CD4 are identified using Cross-spectral analysis. The amplitude of each virus at the CF, which represents the degree of affinity is then used in the analysis. The results obtained are discussed as below:
5.5.1
Cross-Spectral Analysis:
In this study, it is revealed that the CF of the HIV gp120 is 0.0354 while that of the CD4 is 0.0373. Also the CF for the combination of the HIV and SIV gp120 is derived as 0.0354. Preliminarily, the results of the Cross-Spectral analyses of the HIV gp120 and CD4 have been identify to yield the same Consensus Frequencies of 0.035 [98]. These outcomes appear to agree with our findings. Based on these findings, the degree of affinity between the HIV isolates and the CD4, which is a function of their amplitudes are used to study the tropic and phenotypic associations of the viruses. The results are also used to examine other relationships that may exist amongst the viruses and the host species.
5.5.2
HIV-1 Viruses
5.5.2.1
MFA Isolate (HIV-1 T-tropic Virus):
As shown in Fig. 5.2, MFA isolate has the highest amplitude value of 1.0 at position 18 (CF=0.0354) signifying 100% affinity in terms of bio-recognition and binding interaction (affinity) with CD4. RRM is known to engage EIIP, a amino acid parameter which symbolizes affinity [214]. In the clinical experiments, it has been shown that MFA has the highest 100% affinity for CD4 [162; 164]. In addition, it has demonstrated that the HIV-1 MFA isolate induces most cytopathogenicity, that describes its ability to cause damage or disease to cells [168]. The outcome of this computational investigation therefore appears to correlate with the clinical experimental findings. MFA isolate has been recognized as candidate or reference interactor [129] Fig. 5.2 also shows the varying affinities of the three classes, the HIV-1, HIV-2 and SIV to CD4 using one isolate from each class. While (A) shows HIV-1 MFA with the highest amplitude of 1.0, symbolizing 100% affinity (B), a HIV-2 isolate
114
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
called ST/24.1C2 has amplitude of 0.3413, signifying 34.13% binding interaction, and a SIV isolate called MB66 (C) has 31.94% affinity, all to the CD4. 5.5.2.2
HIV-1 T-tropic Viruses:
The degrees of affinities to the CD4 by HIV-1 T-tropic isolates obtained here by means of RRM are evaluated in line with the initial clinical studies. Tab. 5.2 revealed that 14 out of 19 HIV-1 T-tropic viruses displayed more than 50% affinity for the CD4. However, only five T-tropic viruses demonstrated weak affinity for CD4. They are WMJ22 (49.88%), CDC-451 (30.00%), Z6 (37.30%), ELI (45.21%) and WMJ1 (48.90%). All the isolates in this group are found to be SI. The results appear to suggest that our findings on the affinity of these HIV-1 T-tropic viruses to CD4 concur with their high level of virulence and pathogenicity which they are clinically associated with [162; 164]. It has also been confirmed that HIV-1 T-tropic viruses utilize both CD4 and CXCR4 co-receptor. 5.5.2.3
HIV-1 M-tropic Viruses:
Amongst the HIV M-tropic viruses as presented in Table 5.3, five isolates have affinity for the CD4 higher than 50%. LW123 has affinity as significantly high as 82.96% for the CD4, while 92BR025 has 61.84%. The degrees of affinity of JRCSF and SE9173 isolates are 66.08% and 57.26%, respectively. In addition, 90CF056 has 51.74% affinity for the CD4. Other isolates do not have remarkable affinity for the CD4. LW123 is identified as a Macrophage loving (M-tropic) virus, although it is a clone obtained from a laboratory worker (LW) infected with T-tropic LAI/111B isolate [164]. This may explain why its amplitude at the point of contact with the CD4 is high though it is a M-tropic virus (see Tab. 5.3). Clinically, it has been recognized that M-tropic viruses demonstrate low virulence, less pathogenicity and NSI capacity [155; 162]. They are known to infect the CD4 using CCR5 [197]. They also maintain low attraction for CXCR4 though they have high affinity for CCR5 co-receptors. They constitute the viral population during the asymptomatic as well as sero-conversion stage but dwindle as
115
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
HIV progresses to AIDS [6; 155]. They respond reasonably to treatment with CCR5 antagonist such as Maraviroc [39; 40]. 5.5.2.4
Dual HIV-1 Isolates:
Two Dual-tropic viruses are engaged in the study. They are Strain 89.6 and SF162. SF162 has dual-tropic characteristic arising from two point mutations (I276R and A281V) [158]. SF162 M-tropic variant has 37.50 % affinity for the host CD4. As shown in Tab. 5.4, the SF162T-tropic variant has about 4% increased affinity for the CD4 (41.08%) arising from two mutations. As shown in same table, it has also been identified as NSI. Strain 89.6, which a SI virus, is found to have 66.28% attraction for the CD4. The significance of the effect of two point sequence variationa in the SF162 in the transformation of the SF162 M-tropic to SF162 T-tropic and resulant progression of HIV infection to AIDS is demonstrated in Chapter 4.
5.5.3
HIV-2 Isolates:
It is observed that all the HIV-2 isolates including ST/24.1C#2 (Fig. 5.2) have low amplitude values at the Consensus Frequency (Tab. 5.6). The low peak value displayed by the HIV-2 seems to symbolize weak affinity for CD4. This weak affinity for the CD4 is in accord with an already experimented finding which recognized that HIV/AIDS infections by HIV-2 strains are CD4-independent. This is because HIV-2 are found to circumvent the gp120-CD4 binding by forming a distinct conformation that enables contact with a Transmembrane co-receptors [196].
5.5.4
SIV Isolates:
It is also revealed in this study (Tab. 5.7) that all SIV isolates including MB66 (Fig. 5.2) demonstrated weaker binding characteristics with the CD4. This is as a result of the low amplitude observed at position 18, the Consensus Frequency (CF=0.0354) by all the SIV. This is in accord with the experimented findings that HIV/AIDS infections by SIV strains get around the gp120-CD4 binding
116
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
by forming a distinct conformation that enables contact with a Transmembrane co-receptors [196].
5.5.5
HIV Phenotypic Associations:
Although HIV phenotypic associations has been regarded as an intrinsic feature that does not have any relationship with the viral load, hence the degree of virulence [207], it is observed that, with the exception of CDC-451 and Z6, all HIV-1 tropic viruses are SI. Also, all M-tropic viruses excluding LW123, a clone of a T-tropic virus. This may therefore suggest higher level of cytopatogenicity is associated with SI phenotypic associations. This result concors with the clinical finding of another group which reported NSI is linked to less virulence and pathogenicity [155; 162]. The results obtained in these experiments appear to suggest strong relationship between Tropic association. This is because, as shown in Tab. 5.2, 14 HIV-1 T-topic viruses, which demonstrated more than 50% affinity with the CD4 are known to belong to the SI group. Additionally, one M-tropic isolate LW123, which has 82.96% binding interaction with the CD4 is recognized to belong to SI category [164]. CD4 depletion is also found to be directly related to SI capavity. As a result, the increase in the affinity demonstrated by two point mutation in the SF162M-tropic that transforms it into SF162T-tropic may therefore for a translation into SI property. This needs to be verified clinically.
5.5.6
Prediction of HIV Tropic and Phenotypic Associations:
Based on the results presented in Tab.s 5.2 and 5.3, the HIV-1 T-tropic viruses are found to possess higher affinity for the CD4 and SI capacity. This is unlike the M-tropic viruses. It can therefore be predicted in this study (Table 5.8) that two HIV-1 isolates with the highest affinity for the CD4 OYI (87.51%) and KB1/ETR (67.19%) belong to the T-tropic and SI groups. In the same manner, isolates that demonstrated least binding interaction to the CD4 96CM-MP535 (12.36%) and MAL (29.21%) may belong to M-tropic and NSI categories .
117
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
Table 5.8: 2. Predicted HIV-1 Tropic and Phenotypic associations. Two isolates with least affinity for the CD4 are predicted to belong to HIV-1 M-tropic and NSI, while two with the highest attraction to the CD4 are envisaged to be part of T-tropic and SI groups Isolate % affinity Rank Predicted group 96CM-MP535 12.36%
Least
HIV-1 M-tropic and NSI
MAL
29.21%
2nd least
HIV-1 M-tropic and NSI
KB-1/ETR
67.19%
2nd Highest
HIV-1 T-tropic and SI
OYI
87.51%
Highest
HIV-1 T-tropic and SI
5.5.7
Maximum Amplitude-based Categorization:
Based on the position of maximum amplitude, the HIV and SIV isolates as well as the host species investigated in this research and are also categorized into groups. Common maximum amplitude appears to suggest consensus maximum interaction with another protein. This characteristic seem to be peculiar to group of proteins and as a result, consensus maximum interaction suggests a demonstration of common biological functionality. In this study therefore, position of maximum amplitude is used to categorize HIV isolates and host species. 5.5.7.1
HIV and SIV Isolates:
RRM-based categorization of the HIV and SIV isolates is attempted by using similarity in the position or frequency of the highest amplitude. For the HIV-1, a clone MFA and three isolates BH10, HXB3 and HXB2, the parent isolate of MFA [162; 164] are found to have higher amplitudes of 1.0, 0.9814, 0.9655, 0.9221, respectively at the same position 18 (f=0.0354) (Tab. 5.2). Preliminary phylogenetic studies have shown that HXB3, BH8, HXB2, BH10, PV22, NL43 and MFA originate from the same source [172; 215] while MFA is a clone of the HXB2 [162; 164]. This therefore appears to explain why they all have such higher and very similar amplitude values at the Consensus Frequency. Other isolates with identical spectral characteristics suggesting common relationships include ELI, MAL and Z2/CDC-Z34 (Fig. 5.7). They are identified in this study to have maximum amplitude values of 0.6326, 0.6445 and 0.7289,
118
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
respectively at the same position 150 (f=0.295), which symbolize common biological functionality. MAL, ELI and Z2 (also known as CDC-Z34) have been identified as Zairian isolates [170; 209]. OYI, Z6, and CDC-451 (Fig. 5.3) share the highest amplitude of 1.0, 0.7347 and 0.6854, respectively at the same position 155 (f=0.305). CDC-451 is an American isolate [173], whereas Oyi and Z6 belong to the Gabonese and Zairian stock [170; 209]. The result suggests a strong relationship amongst them. Another set of three isolates which shared the highest amplitude at the same frequency or position consists of American isolates SC, a Zairian (WMJ1) and Cameroonian isolate called 96CM-MP535 (Fig. 5.4). They all have the highest amplitude of 0.6842, 0.5042 and 0.5499, respectively at position 152 (f=0.299). Also, SF2, an infectious clone from isolate ARV-2 [170; 209] is found to share the highest peak at position 155 (f=0.305) with JRCSF. Both are of American stock [170; 209]. Both results appear to indicate that there may exist a relationship amongst each set of three isolates, which may have come from laboratory handling or cross Atlantic transmission and therefore needs further experimental investigation. VI850, which is one of the HIV-1 group M isolates [213] is found to share the highest amplitude at position 158 (f=0.311) with that of a chimpanzee MB66 (Fig. 5.5). Using phylogenetic tree, Wain et al [216] has shown the three independent ape-to-human cross-species known to be the foundation of the pandemic (group M) and non-pandemic (groups N and O) clades of HIV-1 found to infect West Central African chimpanzees called Pan troglodytes (SIVcpzPtt). This appear to concur with initially identified ape-to-human cross-species transmission [179]. Amongst the SIV, TAN1, a Tanzanian isolate discovered to share same maximum point of interaction at position 200 (CF=0.394) with the Gabonese counterpart called GAB1. This also suggest a relationship which might be as a resource of origin. Based on the outcome of the study, two isolates from the group of unknown associations, namely OYI, which has 87.51% affinity for the CD4, and KB-1/ETR with 67.19% are predicted to belong to the T-tropic association with SI capacity. In a similar manner, two other isolates with least attraction to the CD4, 96CMMP255 (12.36%) and MAL (29.21%) may belong to the M-tropic group with NSI
119
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
capacity. 5.5.7.2
Host Organisms:
The Spectral characteristics of the protein residues of the CD4 from the host organisms are examined in relation to their individual highest amplitude positions and their corresponding amplitude. The SC of the CD4 of some of the species studied revealed same positions for the highest amplitude (highest interaction), suggesting common biological relationships. This attribute is employed in the categorization of isolates and host species. It is demonstrated in Fig. 5.8 and Tab. 5.1 that the Spectral features of the protein residues of the CD4 from Homo sapiens (Human) and Pan Troglodytes (Chimpanzee) share similar characteristics. Both have amplitude values of 0.5276 and 0.5280, respectively, at position 18 (CF=0.0373). This therefore appears to suggest that CD4 of Human and Chimpanzee have very similar binding powers (52.76% and 52.80%, respectively) for the HIV. At position 68 (f=0.1410), the CD4 from Human and Chimpanzee also have their highest amplitude of 0.7235 and 0.7640, respectively. Several experimental procedures had been used to verify the close relationship between human and Chimpanzee [41]. As much as 98% similarity has been identified in their DNA [217]. This result obtained at the proteomic level therefore appears to strengthen the already established relationships between Human and Chimpanzee. Fig. 5.9 and Tab. 5.1 illustrate the spectral characteristics of the CD4 from other organisms including Dancing, Green and Pig-tailed monkeys. These species have their highest amplitude values of 0.6592 and 0.7511 at same position, 101 (f=0.210). Details of the spectral characteristics of each amino acid sequence of the CD4 from the 25 HIV host organisms are shown in Tab. 5.1. Based on these outcome, it is shown also that the European sea bass is most attracted to the HIV/SIV isolates (81.10%), while the Japanese puffer-fish has least affinity with the viruses (27.14%).
120
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
5.6
Conclusions:
Categorization of organisms and viruses have helped design and develop therapies. It has also assisted in effective management of disease. This approach is utilized in the designing, developing and monitoring of the efficacy of Maraviroc, a CCR5 antagonist. Categorization of the HIV/SIV as well as investigation into relationships that exist between and amongst HIV/SIV and their hosts are achieved in this study using a Bioinformatics technique called Resonant Recognition Model (RRM). RRM is first employed to determine the amplitudes of the various gp120 from HIV and SIV isolates at the point of interaction with the CD4 (Consensus Frequency). The Consensus Frequencies of the HIV gp120 and the CD4 of the host obtained are found to be in accord with that of preliminary study [98]. The amplitude values which signify the degree of affinity, as result of the fact that an amino acid scale called EIIP is used. EIIP governs bio-recognition and bioattachment. The degrees of affinity at the point of interaction with the CD4 are further used to categorized HIV/SIV into tropic and phenotypic associations. Also, the maximum level of affinity is also used to classify both the HIV/SIV and the host. It is identified in this study that 74% of HIV-1 T-tropic viruses have over 50% affinity for CD4. Also, 58% of the HIV-1 M-tropic isolates are found to have fragile pulling force on the CD4, 50% for the CD4. OYI and KB-1/ETR , with highest affinity for the CD4 are predicted to belong to the T-tropic association with SI capacity, while 96CM-MP255 and MAL, which have least binding interaction with the CD4 suggest that they belong to the M-tropic group with NSI capacity. Our results also identified common relationships amongst the isolates and host organisms. For example, an American isolate CDC-451 is found to share common feature such as the highest amplitude position with the Gabonese (OYI) and Zairian (Z6) counterparts, which appears to suggest that trans-Atlantic transmission. All HIV-1 T-tropic viruses are also identified to belong to the SI are found to have high affinity for the CD4. All HIV-1 T-tropic viruses, which are also identified to belong to the SI are found to have high affinity for the CD4. In addition, ape-to-human cross-species
121
5. Resonant Recognition-Based Characterization and Identification of HIV Tropism and Phenotypic Associations
transmission of the viruses appears to be established as HIV-1 V1580 share a common biological functionality, which is demonstrated as a common position of maximum interaction. This characteristic is also noticed in HIV isolates from same location such as BH8, BH10, HXB2 and HXB3. These outcomes appears to strengthen clinically established findings. In addition, the results obtained in these experiments suggest that there are strong correlations between affinities and HIV tropic and phenotypic associations. In addition, the results reveal common point of maximum interaction amongst species, which is an indication of common biological functionality amongst the organisms and viruses. A common point of maximum interaction is observed in human and chimpanzee, an outcome that appears to strengthen already established common characteristic between them. However, at the proteomic level, this finding seems to be the first made. besides, Japanese puffer-fish demonstrated least attraction unlike the European sea bass. Categorization of the HIV into tropic, phenotypic and other associations will be useful when engaged in the designing and development of HIV/AIDS therapeutic interventions, and also in re-strategizing treatment and management plans. Categorization of the HIV into CCR5 co-receptor usage has preliminarily helped in the designing of CCR5 antagonist called Maraviroc, which is still one of the FDA approved anti-retroviral agents in use for the management of HIV/AIDS. It is noted in this study that the materials and the procedure are intrinscally validated. This is because the protein residues are obtained from clinically experimented finding while the ASSs have been clinically and computationally derived. ————————————————————————
122
Chapter 6 Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study 6.1
Summary:
In this Chapter, a Digital Signal Processing-based technique called Informational Spectrum Method (ISM) is engaged in the assessment of the resistance offered by five classes of anti-HIV/AIDS drugs. They are Fusion Inhibitors (Enfuvirtide), Protease Inhibitors (Darunavir), Integrase Inhibitors (Raltegravir), Maturation Inhibitor (Bevirimat) and Nucleotide Reverse Transcriptase Inhibitors (Lamivudine). A version of these studies are published as 1 The mechanism of action, hence resistance resulting from exposing each antiHIV/AIDS drug to the target proteins are first studied at the molecular level so as to justifiably obtain and engage appropriate amino acid scales. The antiHIV/AIDS drugs studied belong to the FDA-approved by the FDA HIV/AIDS drug from the five classes. The contributions offered by the protein residues 1
Nwankwo N, Seker H. 2010. ”A signal processing-based bioinformatics approach to assessing drug resistance: human immunodeficiency virus as a case study”. Conf. Proc. IEEE Eng Med Biol Soc. 2010:1836-1839.
123
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
of the target enzymes exposed to both susceptible strains of the HIV and the mutation-induced resistant strains of the five anti-retroviral agents are assessed. The degrees of interaction displayed by the mutated protein residues of the resistant strains of micro-organisms and viruses have been identified to be lower compared to those from the susceptible strains. This appears to signify the mutated protein residues exhibited reduced susceptibility or resistance. This finding has helped to propose a computational tool that will assess resistance offered by drug without involving expensive and slow clinical experimentations. It is called Computer-Aided Drug Resistance Calculator CADRC). In clinical practice, measuring drug resistance is challenging. It is also an essential pharmaceutical activities. It is a labour-intensive and expensive laboratorybased experimentation. Several clinical and pharmaceutical experiments have been conducted in order to obtain resistances offered by various therapeutic agents as a result mutations acquired by of the organisms they are exposed to. Similarly, the positions of these mutations and the degree of resistance (folds) have also been experimentally identified. G36S and V38M mutations in the Human Immunodeficiency Virus (HIV) Trans-membrane glycoprotein (gp41), for example has been recognized to cause 100-fold resistance to the fusion inhibitor called T20. However, Digital Signal Processing-based approaches such as Informational Spectrum Method, which engages the amino acid information of the proteins involved and appropriate amino acid scale have not been used to computationally assess the degree of drug resistance. By using the ISM and one appropriate amino acid scale, the technique was applied in five classes of anti-HIV/AIDS drugs as a case study. It is observed that the protein residues of the susceptible strains demonstrated maximal interaction at the point of interaction, unlike the resistant strains, which maintained showed lower interaction amplitudes. This result signifies lower contribution from the resistant strains due to the mutations. The findings are found to be in accord with those of the experimental ones.
124
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
6.2
Introduction
At molecular level, drug resistances manifest as emergence of amino acid alterations in one or more of the positions in the entire sequences of the organism. These alterations are called mutations. At the physical level, factors such as pharmacodynamic, pharmacokinetic and Pharmacogenomic are attributed to drug resistance. These result from disintegration, dissolution, absorption and subtherapeutic concentrations, it is mutation-induced [218]. The after effect of these physical factors is mutations. Computer-assissted assessment of drug resistance using information on mutations has become necessary as it is rational. Resistance arising from mutations in the target protein residues of organisms exposed to these therapeutic agents has been recognized [218] - [219]. Such studies have been carried out in micro-organisms [218], malaria parasites [219] and HIV [220] - [221]. The points of mutations and the degree of resistance (fold) amongst these organisms and various exposed to various therapeutic agents have been identified. Informational Spectrum Method (ISM) procedure recommends that any change in the amino acid composition in any position in the sequence of protein residue affects both spectral and cross-spectral features [33]. Biologically, these mutational changes on the sequences of living organisms and viruses determine the structural and physiochemical properties of their protein contents. This governs the general behaviour of the organisms. DSP techniques allows these bio-functionalities to be calculated [35; 92]. Using these structural and physiochemical properties, ISM procedure is engaged to find out if drug resistance arising from five anti-retroviral agents can be derived from their sequence information. In this Chapter, one drug each from five classes of the anti-retroviral agents in use for HIV/AIDS management is investigated so as to assess its susceptibility or resistance resulting from mutations by its HIV target proteins.
125
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
6.3 6.3.1
Materials and Methods: Materials:
Anti-retroviral agents including Fusion Inhibitors (Enfuvirtide), Protease Inhibitors (Darunavir), Integrase Inhibitors (Raltegravir), Nucleotide Reverse Transcriptase Inhibitors (Lamivudine) and Maturation Inhibitors ( Bevirimat) are first studied for their possible mechanism of action and resistance so as to justify the choice of amino acid scale used. The five amino acid scales engaged in this study are as shown in Fig. 6.1. Then, the consensus amino acid sequences of the HIV target proteins were retrieved from Stanford University Release Notes [221]. Corresponding amino acid sequences of the susceptible and resistant strains as documented in various publications were then constructed from the consensus sequences and analyzed using ISM. The classes of the anti-retroviral agents examined are: 6.3.1.1
Fusion/Entry Inhibitors (example: Enfuvirtide)
Two helical regions of the HIV Transmembrane protein (Heptad Repeats HR1 and HR2) are known to undergo a molecular re-organization, which ushers in inter-cellular perforation and eventual transfer of the HIV viral content into the host T-cells [222]. The conformational re-arrangement results in the formation of the six-helix bundle structure. This process is inhibited by Enfuvirtide (T20), a synthetic structural analogue of Heptad Repeats HR2 with 36 amino acid compositions [222]. As a result of the Helix-oriented conformational re-arrangement, Helix conformation-based amino acid scale with the descriptor name ROBB760104 and title ”Information measure for C-terminal helix” obtained from GenomeNet Database Resources [3] is chosen for this analysis. It has been noted that double point mutations (G36S and V38M) in the HIV Transmembrane protein (HIV gp41) demonstrate 100-fold increased in resistance in comparison with the wild-type virus [223]. In addition, other mutations such as G36D/E, V38E/A, Q40H, N43D have been associated with the T20 resistance [221]. These two set of mutations are investigated in this Chapter using ISM.
126
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Table 6.1: Amino acid scales engaged: (1) LIFS790103 [3], (2) WILM950103 [4], (3) BEGF750102 [3], (4) ROBB760104 [5] and (5) EIIP [6] Amino Acids 1 2 3 4 5 Alanine (A)
0.90 -1.64 0.77
2.3
0.0373
Arginine (R)
1.02 -3.28 0.72
1.4
0.0959
Asparagine (N)
0.62
0.83
0.55
-3.3
0.1263
Aspartic acid (D)
0.47
0.70
0.65
-4.4
0.0036
Cysteine (C)
1.24
9.30
0.65
6.1
0.0829
Glutamine (Q)
1.18 -0.04 0.72
2.7
0.0761
0.55
2.5
0.0058
Glycine (G)
0.56 -1.85 0.65
-8.3
0.0050
Histidine (H)
1.12
7.17
0.83
5.9
0.0242
Isoleucine (I)
1.54
3.02
0.98
-0.5
0.0000
Leucine (L)
1.26
0.83
0.83
0.1
0.0000
Lysine (K)
0.74 -2.36 0.55
7.3
0.0371
Methionine (M)
1.09
0.98
3.5
0.0823
Phenylalanine (F)
1.23 -1.36 0.98
1.6
0.0946
Proline (P)
0.42
3.12
0.55 -24.4 0.0198
Serine (S)
0.87
1.59
0.55
-1.9
0.0829
Threonine (T)
1.30
2.31
0.83
-3.7
0.0941
Tryptophan (W)
1.75
2.61
0.77
-0.9
0.0548
Tyrosine (Y)
1.68
2.37
0.83
-0.6
0.0516
Valine (V)
1.52
0.52
0.98
2.3
0.0057
Glutamic acid (E) 0.62
6.3.1.2
1.18
4.26
Nucleotide Reverse Transcriptase Inhibitors (example: Lamivudine)
In order for the genetic material of the HIV, which is single-stranded viral RNA genome to integrate into the double-stranded proviral DNA of the host species, there is need for it to be duplicated. Nucleotide Reverse Transcriptase enzyme is involved in this process [224]. This enzyme comprises of two subunits derived from Gag-Pol, made up of subunits of 66 and 51 KDa also called p66 and p51
127
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
respectively [220; 222]. The carboxylates active-site, namely Asp110, Asp185 and Asp186 bring about the binding of the two divalent cations like Magnesium (Mg2+) and Calcium (Ca2+) required for catalysis [220; 225]. One FDA-approved anti-retroviral agent from this class called Lamivudine, which is already employed in the HIV/AIDS management was studied here. This is in order to re-validate the fact that its resistance can be assessed using sequence information. Single mutation in the Nucleotide RT inhibitors such as M184V is known to cause high level of resistance to the Lamivudine [220; 226]. In addition, stearic hindrance resulting from Beta-branched amino acid constituents such as Valine and Isoleucine at position 184 are identified to inhibit binding of Nucleotide RT inhibitors as in the case of Lamivudine and Emtricitabine [220; 224]. As a result, amino acid scale, which governs stearic conformation, which is deposited in GenomeNet Database Resources [3] with the descriptor name BEGF750102 and title ” Conformational parameter of beta-structure was engaged in this study. The approach involves the extraction and analysis of the consensus amino acid sequence of the Nucleotide Reverse Transcriptase enzymes from Stanford University Release Notes [221], and those of the Lamivudine susceptible and resistant strains constructed as recorded in literature [220] and [221]. 6.3.1.3
Protease Inhibitors (example: Darunavir)
HIV Protease Enzymes are known to break-up the Gag and Gag-Pol polyproteins into units needed for maturation. These protease enzymes are made up of two polypeptide chains, each with a conserved protein residues Asp-Thr-Gly that provides the aspartyl group, which are engaged in the catalytic processes [220; 222]. Protease inhibitors are substrate-based Competitive Antagonists [222]. Essentially, interactions between the substrates and the protease enzymes are known to be principally hydrophobic [222]. Darunavir is another FDA-approved Protease inhibitors and it is investigated in this study. Also, hydrophobicitybased amino acid scale deposited in the GenomeNet Database Resources [3] with the descriptor name WILM950103 titled ”‘Hydrophobicity coefficient in Reverse Phase High performance liquid chromatography (R-P HPLC) C4 with 0.1 per-
128
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
cent TFA/MeCN/H2O” [4] is engaged. As a procedure, consensus amino acids sequence of the HIV protease enzyme was retrieved from Stanford University Release Notes [221]. Corresponding Darunavir susceptible and resistant sequences were constructed based on the sequence information obtained from Stanford University HIVDB report [221] and analysed. An example of retrievals and constructions of sequences for the susceptible (mutant 1) and resistant (mutant 2) strains in the case of Darunavir are as shown in Fig. 6.1.
Figure 6.1: Protein 1 (Darunavir): Sequences showing mutations in elongated letters. There are 5 mutations in the the amino acid sequences of the resistant strain
6.3.1.4
Maturation Inhibitors (example: Bevirimat (BVM))
Maturation Inhibitors are non-enzymatic protease inhibitors. Unlike the Protease Enzyme Inhibitors, which are known to act directly on the enzyme that processes Gag and Gag-Pol polyproteins, the Maturation Inhibitors block the process of slicing by binding to the immature HIV particle [222; 227] - [228]. By attaching to the immature HIV particle, the disrupt the maturation processes. One Maturation inhibitor, which has been approved and employed in the management of HIV/AIDS is the Bevirimat (BVM). Bevirimat associated mutations have been identified including sixteen sets that offer as much as 100-fold resistance [229]. In
129
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
this research, one set of these mutations is analyzed using ISM. The mutations are H358Y, L363F, A364V and A366T [227] - [229]. To justify the choice of amino acid scale, the mechanism of action and therefore resistance is studied. It is understood that the BVM bio-recognize and bio-attach to the immature Gag and Gag-Pol polyproteins and prevent their processing [230]. As a result, EIIP, an amino acid scale that governs bio-recognition and bioattachment is considered. Therefore, the consensus amino acids sequence of the target proteins of the BVM are extracted from UNIPROT. The sequences of the susceptible and resistant strains are constructed from the consensus as recorded in the literature and analyzed. The retrievals and constructions of sequences for the susceptible and resistant strains of Bevirimat were exemplified in Figure. 6.2. 6.3.1.5
Integrase Enzyme Inhibitors (example: Raltegravir )
After maturation, the transcribed HIV DNA are inserted into the human genome. This process is called Integration and is catalyzed by Integrase Enzyme, which is compose of 288 protein residues [222; 230]. Inhibition is achieved by quickening the 3-end processing and thereafter, the strand transfer resulting in blocking of the viral integration into the human genome [222; 230]. Raltegravir is one of the anti-HIV/AIDS agent, which belongs to the Integrase Enzyme Inhibitors that has been in use for the management HIV/AIDS disease. Mutations resulting in reduced susceptibility have been recorded. They include E92Q, F121Y, E138A, G140A, Y143R, S147G, Q148H and N155H [222]. Resistance arising from these mutations is assessed in this study using one of the Strand Transfer-based amino acid scale deposited in the GenomeNet Database Resources [3], which has a descriptor name LIFS790103 [3]. Consensus amino acid sequence of the target proteins of the Raltegravir is extracted from UNIPROT. The sequences of the susceptible and resistant strains are constructed from the consensus sequence as recorded in the literature and analyzed. Whereas there are over 565 amino acid scales that are available, more than one is found to account for one mutation, [70]. The retrievals and constructions of sequences for the susceptible and resistant strains of Raltegravir were exemplified in Fig. 6.3.
130
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.2: Protein 2 (Bevirimat): Sequences showing mutations in red and elongated letters.There are 3 mutations in the the amino acid sequences of the resistant strain
6.3.2
Informational Spectrum Method (ISM
Informational Spectrum Method (ISM) has been used by researchers for various purposes such as functional classification of proteins [33]. ISM was then taken into consideration and utilized in this study to be able to computationally determine degree of drug resistance. It is detailed in Chapter 3 and briefly described below.
131
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.3: Protein 3 (Raltegravir): Sequences showing mutations in red and elongated letters. There are 8 mutations in the the amino acid sequences of the resistant strain The ISM involves three main steps: (1) converting the alphabetic code of amino acid sequences into numerical values using amino acid scale that relates to the interaction under investigation. This includes the hydrophobic-oriented substrate-based competitive antagonism of the HIV Protease enzyme by Protease Inhibitors like Darunavir earlier mentioned, (2) Processing the numerical sequences (signals) using discrete Fourier transform (DFT). Absolute values of the complex DFT represented as a plot called Informational Spectrum (IS) discloses the information embedded in the protein residues. Here, the y-axis of the plots (IS), which is the amplitude signifies the contribution in terms of susceptibility or resistance by each sequence while the x-axis, the Frequency, determines the position of interaction and here, the inhibition by anti-HIV/AIDS drugs, and (3) obtaining common informational spectrum. When comparing proteins with common biological functions like the anti-HIV/AIDS inhibitory activities being examined, point-wise multiplication of the DFT-processed signal from all the sequences studied provides common information about them. This is called Common Informational Spectrum (CIS). The method was applied in five classes of anti-HIV/AIDS drugs as a case
132
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
study to be able to show its capability of assessing degree of drug resistance without the use of any laboratory-based methods. Its results are presented and discussed below.
6.4
Results
The result of the Common Informational Spectrum (CIS) and Informational Spectrum (IS) of the amino acid sequences from HIV target sites of the Fusion Inhibitors (Enfuvirtide), Protease Inhibitors (Darunavir), Integrase Inhibitors (Raltegravir) and Nucleotide Reverse Transcriptase Inhibitors (Lamivudine) were presented in Fig.s 6.4-6.12 and Tab. 6.2. Table 6.2: Resistance offered by the HIV target proteins of the anti-HIV drugs obtained as the differences the amplitudes of the Susceptible and Resistant strains at the points of interaction between the drugs and the target proteins. Drug Scale CF Susceptible Resistant Resistance % Resistance Enfuvirtide ROBB760104 1.0 1.0 0.9777 0.0223 2.23 Lamivudine BEGF750102 1.0 1.0 0.9467 0.0533 5.33 Darunavir WILM950103 1.0 1.0 0.8995 0.1005 10.05 Bevirimat EIIP 1.0 1.0 0.9352 0.0648 6.48 Raltegravir LIFS790103 1.0 1.0 0.9684 0.0316 3.16
6.4.1
Fusion Inhibitors (example: Enfuvirtide)
The CIS for Enfuvirtide shows peak amplitude of 1.0 at the consensus peak position 3 (Fig. 6.4). The IS of the Enfuvirtide susceptible and resistant strains have peak amplitudes of 1.0 (Fig. 6.5) and 0.9777 (Fig. 6.6), respectively at the same peak position 3 and as such differ by 0.0223 (Tab. 6.2). This shows that the amino acid changes from the resistant strains contributed less to the CIS on the account of only one scale used.
6.4.2
Protease Inhibitors (example: Darunavir)
Similarly, the CIS of Darunavir shown in Fig. 6.7 displayed a consensus peak position at 39 with an amplitude of 1.0 while the IS of the susceptible and resistant
133
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.4: CIS of the protein residues of both susceptible and resistant strains exposed to Fusion inhibitors (Enfuvirtide) showing amplitude of 1.0 at position 3, indicating 100% Pharmacological affinity for both sequences at the point of interaction. strains revealed peak amplitudes of 1.0 and 0.8995, respectively (Fig.s 6.8 and 6.9) at the same peak position 39. In effect, susceptible strain contributes more to the consensus amplitude by 0.1005 (Tab. 6.2).
6.4.3
Nucleotide Reverse Transcriptase Inhibitors (example: Lamivudine)
In addition, Lamivudine spectral analysis revealed a consensus peak position at 40 and an amplitude of 1.0 (Fig. 6.10) while the susceptible and resistant strains have peak amplitudes of 1.0 and 0.9467, respectively (Fig.s 6.11 and 6.12) at peak position 39. Hence, it can be concluded that susceptible strain contributed more to the consensus amplitude by 0.0533 (Tab. 6.2).
134
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.5: IS of the protein residues of the susceptible strain exposed to Fusion inhibitors (Enfuvirtide) showing amplitude of 1.0 at position 3, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction.
6.4.4
Maturation Inhibitors (example: Bevirimat (BVM))
The Cross-Spectral analysis of the target protein residues of the anti-retroviral agent, Bevirimat demonstrates a high amplitude of 1.00 or 100% Pharmacological activity for the susceptible strain (Fig. 6.14), while the resistant strains shows amplitude of 0.9352 (Fig. 6.15) at the point of interaction. EIIP scale is used because the interaction considered in this study is bio-recognition and bioattachment. The the point of interaction (Consensus Frequency) as displayed by the Cross-Spectral analysis is at position 165 (Fig. 6.13). This therefore suggest a resistance of 6.48%.
6.4.5
Integrase Enzyme Inhibitors (example: Raltegravir)
The CIS analysis of the amino acid sequences of the target protein for the antiHIV/AIDS agent Raltegravir, the Integrase Enzyme using the susceptible and resistant strains demonstrate a Consensus Frequency at position 67 with an am-
135
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.6: IS of the protein residues of the resistant strain exposed to Fusion inhibitors (Enfuvirtide) showing amplitude of 0.9777 at position 3, indicating 97.77% Pharmacological affinity and resistance of 2.23% for amino acid sequences of the resistant strain at the point of interaction. at position 3. plitude of 1.00 (Fig. 6.16). The resistant strain shows low efficacy by 3.16%. This is because the susceptible strains has amplitude of 1.00 or 100% Pharmacological activity (Fig. 6.17) while the resistant isolate has amplitude 0.9684 or 96.84% activity (Fig. 6.18).
136
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.7: CIS of the protein residues of both susceptible and resistant strains exposed to Protease inhibitor (Darunavir) showing amplitude of 1.0 at position 39, indicating 100% Pharmacological affinity for both sequences at the point of interaction.
6.5
Discussions
Resistance arising from exposure of 5 anti-retroviral agents to their target proteins is assessed by means of a Digital Signal Processing called Informational Spectrum Method. One amino scale that relate to the mechanism of action of the drug, hence resistance is engaged. This is in order to further demonstrate that biological functionalities such as drug resistance can be computationally evaluated using sequence information. Three anti-HIV/AIDS agents have preliminarily been used to demonstrate the applicability of this procedure [92]. Computational assessments are known to be rational as they save time and resources. The consensus amino acid sequences of the HIV target proteins were retrieved from Stanford University Release Notes [221] and other literatures as cited. ISM procedure is one of the DSP-based procedures that are being engaged in the analysis of proteins [33; 36; 58; 93]. The five ASSs engaged as shown in Table 6.2 are amongst the 565 clinically and computationally derived ASSs that are
137
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.8: IS of the protein residues of the susceptible strain exposed to Protease inhibitor (Darunavir) showing amplitude of 1.0 at position 39, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction. deposited in a database [3]. No generated data is engaged showing that the materials and the technique are inherently been validated. Resistance offered by the mutated protein residues that have been in contact with drugs are demonstrated as decreased amplitudes at the point of interaction in the frequency spectrum of the ISM. The protein residues of the susceptible strains are found to achieve higher amplitudes at the position of interaction than the protein residues of the resistant strains. This symbolizes decreased activities. Fig. 6.19 shows the percentage decrease in susceptibilities (Resistance) demonstrated by each anti-retroviral agents. By means of hydrophobicity-based amino acid scale called WILM950103 for example, Darunavir appears to demonstrate highest resistance of 10.05%. Amino acid scales engaged are listed in Tab. 6.2. Lamivudine and Bevirimat have 5.33% and 6.48% reduced susceptiblity respectively, while Raltegravir and Enfuvirtide show about least resistance of 3.16% and 2.23%, respectively. This study is conducted using using one amino acid scale for each drug.
138
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.9: IS of the protein residues of the resistant strain exposed to Protease inhibitor (Darunavir) showing amplitude of 0.8995 at position 39, indicating 89.95% Pharmacological affinity and resistance of 10.05% for amino acid sequences of the resistant strain at the point of interaction. These outcomes, obtained using sequence information are also found to be in accord with the initial clinical experiments, though the resistance recorded by each drug is found low. This is because only one amino acid scale is utilized. More than one amino acid scale have been implicated in one point mutation and the total resistance can be obtained as a sum of the resistance presented by each amino acid scale [70]. As a result, engaging all other scales and point mutations are required to obtain the appropriate resistance offered by each drug.
6.6
Conclusions
In this study, it was demonstrated that resistance offered by a drug whose amino acid sequence information is available can be computed by using Informational Spectrum Method and appropriate amino acid scales. ISM is based on discrete Fourier transform. The methodology developed was successfully applied in five classes of anti-HIV/AIDS drugs as a case study. They were the Fusion Inhibitors
139
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.10: CIS of NRT inhibitor (CIS of the protein residues of both susceptible and resistant strains exposed to NRT inhibitor (Lamivudine) showing amplitude of 1.0 at position 40, indicating 100% Pharmacological affinity for both sequences at the point of interaction. (e.g., Enfuvirtide), Protease Inhibitors (e.g., Darunavir) Nucleotide Reverse Transcriptase Inhibitors (e.g., Lamivudine), Maturation Inhibitors (e.g., Bevirimat (BVM)) and Integrase Enzyme Inhibitors (e.g., Raltegravir). The susceptible strains are found to demonstrate higher amplitudes than the resistant strains at the point of interaction (Consensus Frequency). This depicts higher contribution from the the susceptible strains. It is also observed that clinically derived outcomes agree with the ISM-based results, though the margins are found to be low. These findings therefore revealed that drug resistance can be calculated using sequence information and appropriate amino acid scales involved. The materials and procedures employed in this study have an innately acquired validation. The assessment was carried out using only one amino acid scale and as a result, the margins between the peaks identified are small. It has been identified that more than one amino acid scale is involved in one point mutation. There are over 565 amino acid scales. It is therefore envisaged that other amino acid scales
140
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.11: IS of the protein residues of the susceptible strain exposed to NRT Inhibitor (Lamivudine) showing amplitude of 1.0 at position 40, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction. may provide complementary information and hence make it better identified. As a result, total resistance exhibited by each drug can be obtained by aggregating the results using all the amino acid scales and point mutations involved.
141
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.12: IS of the protein residues of the resistant strain exposed to NRT Inhibitor (Lamivudine) showing amplitude of 0.9467 at position 40, indicating 94.67% Pharmacological affinity and resistance of 5.33% for amino acid sequences of the resistant strain at the point of interaction.
142
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.13: CS of NRT inhibitor (CIS of the protein residues of both susceptible and resistant strains exposed to Maturation Inhibitors (Bevirimat) showing amplitude of 1.0 at position 165, indicating 100% Pharmacological affinity for both sequences at the point of interaction.
Figure 6.14: SC of the protein residues of the susceptible strain exposed to Maturation Inhibitors (Bevirimat) showing amplitude of 1.0 at position 165, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction.
143
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.15: SC of the protein residues of the resistant strain exposed to Maturation Inhibitors (Bevirimat) showing amplitude of 0.9352 at position 165, indicating 93.52% Pharmacological affinity and resistance of 6.48% for amino acid sequences of the resistant strain at the point of interaction.
Figure 6.16: CIS of NRT inhibitor (CIS of the protein residues of both susceptible and resistant strains exposed to Integrase Enzyme Inhibitors (Raltegravir) showing amplitude of 1.0 at position 67, indicating 100% Pharmacological affinity for both sequences at the point of interaction.
144
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.17: IS of the protein residues of the susceptible strain exposed to Integrase Enzyme Inhibitors (Raltegravir) showing amplitude of 1.0 at position 67, indicating 100% Pharmacological affinity for amino acid sequences of the susceptible strain at the point of interaction. of susceptible Integrase Enzyme Inhibitors (Raltegravir).
145
6. Informational Spectrum Method-based Approach to Assessing Drug Resistance: HIV as a case study
Figure 6.18: IS of the protein residues of the resistant strain exposed to Enzyme Inhibitors (Raltegravir) showing amplitude of 0.9684 at position 67, indicating 96.84% Pharmacological affinity and resistance of 3.16% for amino acid sequences of the resistant strain at the point of interaction.
Figure 6.19: Percentage decrease in Pharmacological activities (Resistance) shown by anti-HIV/AIDS agents (1) Enfuvirtide, (2) Lamivudine, (3) Darunavir, (4) Bevirimat and (5) Raltegravir.
146
Chapter 7 Resonant Recognition Model-based Approach to Investigating Protein Binding Interactions 7.1
Summary
In this Chapter, Resonant Recognition Model (RRM) technique is used to examine the binding interaction that exist between proteins. Plasmodial and host proteins are studied here. This is because application of Bioinformatics procedures such as RRM to predicting biological functionalities and interactions has become rational and successful in inventing therapeutic interventions. Binding interaction that exist between Plasmaodial and host proteins, which have clinically been established are first identified. They are then computationally assessed. The results are then studied to find out if there are correlations between the two sets of result. This is in anticipation of developing a RRM-based large-scale predictive algorithm for binding interaction. Part of this research is published as 1 1
Nwankwo N, Seker H. 2011. ”Preliminary Investigations into the Binding Interactions between Plasmodial and Host Proteins Using Computational Approaches” J Proteomics Bioinform 4: 269-277. doi:10.4172/jpb.1000200.
147
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
The first step in any bimolecular interaction is binding. The molecules first bio-recognize themselves, exert attractive force towards themselves before attachment can be achieved. Thereafter, series of intra-molecular interactions follow. This depend on how the molecules relate. This relationship can be structural or physiochemical. As a result of the enormous deposits of genomic and proteomic information such as peptides and protein residues that are left for analysis, it has become more challenging to utilize only clinical procedures in assessing bio-functionalities such as binding interactions. Bioinformatics approaches that can undertake large-scale analysis of these genomic and proteomic information have become necessary. This is because experimental approaches to assessing these vast data are expensive, resource-wasting and time-consuming, laborious, and arduous. In addition, computational approaches are found to help streamline, re-strategize and rationalize clinical assessment procedures. In this study, RMM is applied to 42 Plasmodial and 32 host proteins. This is to help predict binding interactions amongst these proteins and peptides. It is envisaged that these computational approaches will help initiate the development of RRM-based algorithm for large-scale prediction of protein binding and study of their bio-functionalities. Our findings appear to reveal that binding interactions are computationally established between proteins like the adhesive domain of the Plasmodial Circumsporozite Protein (CSP) and the host protein, Importin α-3. This is contrary to the Ring-infested Erythrocyte Surface Antigen (RESA) and Spectrin though clinically, they are known to interact. The results therefore appears to suggest that binding interactions could be predicted using RMM. In addition, this study reveals that sequence-contentdependence in clinical interactions are obtainable in computational assessments.
7.2
Introduction
Plasmodial species are the causative organisms for Malaria which are injected into the host blood stream by Anopheles Mosquitoes during a blood meal [231; 232]. Malaria has remained an age-long disease which has defied complete cure as a
148
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
result of drug resistance and repeated re-infection [233; 234]. In Plasmodium species only, about 1543 proteins have been identified as being employed by the parasite for biological functionalities that include entry into the host, transmigration (gliding), infection, metabolism, immunogenicity and others [19]. Proteomic analysis of Plasmodium specie, namely falciparum revealed 728 proteins which constitute 4611 peptides for these functionalities [19]. However, 42 Plasmodial and 32 host proteins are first studied. This is in order to establish that there exist binding interactions between Plasmodial and host proteins. Thereafter, it is to be applied to other organisms and viruses. Additionally, it envisaged that RRM-based large-scale program will be developed in order to help clinical assessments of binding interactions in biomolecules. The Plasmodial proteins that have already been identified include CSP Proteins [139; 235], Thrombospondin-Related Adhesive Protein (TRAP) [5], Merozoite surface Proteins (MSPs) [236], Apical Merozoite antigens (AMAs) [237], Sporozoite Threonine and Asparagine Rich Protein (STARP) [238], Sporozoite And Liver Stage Asparagine-Rich Protein (SLARP) [239; 240], Secreted Ookinete Adhesive Protein (SOAP) [241], Knob-Associated Histidine-Rich Protein (KAHRP) [242], Liver Stage Antigens (LSAs) [243], and Sporozoite and Liver Stage Antigen (SALSA) [134]. Proteins which are utilized for host cell transmigration include Sporozoite microneme Protein Essential for Cell Traversal 1 and 2 (SPECT 1 and SPECT 2) [244], and Cell Traversal protein for Ookinetes and Sporozoites (CelTOS) [245], Perforin-Like Proteins (PLPs) [246], Membrane Attack Ookinete Proteins (MAOPs) [241]. Mutations in the living organisms and viruses are as a result of alterations in the protein residues. These alterations are acquired in an attempt to resist drugs or improve biological functionalities. Consequently, newer proteins and peptides are generated leading to significant increase in the proteomic database. These proteins and peptides are required to be constantly investigated in the clinical laboratories. This is to help understand if they would be suitably engaged in the designing and developing therapeutic interventions. Laboratory-based approaches have been noted to be expensive and resourceconsuming and as such, application of these approaches on the vast deposit of proteins and peptides will burden the already over-stressed laborious, manual,
149
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
clinical procedures. This therefore has called for the application of computational approaches that will help streamline and complement clinical investigations in order to reduce cost and resources. Several clinical procedures including Surface Plasmon Resonance [247] have been used to assess the binding interactions existing between drugs and their target proteins. Like other Laboratory-based approaches, these procedures are recognized to be irrational [49; 50]. Therefore, a computational approach called RRM, which has been used to identify interactions that exist in protein molecules, is applied to the 42 sets of Plasmodial and 32 sets host proteins. This is in order to predict binding interactions in the protein residues of Plasmodial and host proteins. Therefore, the aim is to find out if protein binding can computationally be predicted. In this study, interaction is observed between proteins like CSP and Importin α − 3. These proteins have clinically been ascertained to bind. They are also identified to share common frequency in the RRM spectrum, which signifies they also bind to each other. This approach is to be applied to other species and as such, engaged in order to rationalize clinical experimentations through predicting binding interaction that exist between the proteins involved. RRM, a Digital Signal Processing-based physio-mathematical technique, which has been employed in the analysis of biological functionalities of proteins and peptides, is used in these investigation. The alphabetic codes of the amino acids sequences of the Plasmodial and Host protein are digitized, processed with Discrete Fourier Transform and further point-wise multiplied. A common peak which symbolizes the point of common biological activity is obtained as the Consensus Frequency (CF). The CF of the Plasmodial and host protein residues are obtained and analyzed so as to identify the Plasmodial proteins that share the same point of interaction with the host proteins. Proteins that share the same CF are known to bio-recognize and interact [35]. Our results reveal that, proteins such as the adhesive domain of the CSP and Importin α-3 [248], which are known to interact clinically, shared common point of interaction. This signifies that binding interaction between the two proteins can be predicted. On the other hand, proteins including RESA which are recognized to interact
150
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
with the Spectrin [249] could not share common point of interaction using our computational approach. This may be attributed to the protein components used. There are over 1000 protein residues in RESA and 2000 in the Spectrin with over 19 motifs, each demonstrating series of independent bio-functionalities. It has been observed that in the laboratories, protein interactions are sequence-contentdependent [250]. Engagement of specific protein component is necessary to find out the domains that interact amongst RESA and the Spectrin can be identified. This calls for verification of the fact that computational assessment of protein interaction may be sequence-content-dependent. The methodologies engaged, the results obtained and inferences drawn as well as conclusions made are presented in the subsequent sections.
7.3
Materials and Methods:
7.3.1
Materials:
In order to identify the materials in use for these investigations, preliminary clinical interactions that exist between proteins and peptides from Plasmodia and host target proteins are first briefly discussed. The amino acids sequences of these proteins are then retrieved from the UNIPROT [79] or literatures and further analysed using RRM. This is in order to predict binding interactions between the Plasmodial and its target proteins and possible apply this procedure on other protein-protein interactions. The binding interactions of the Plasmodial and host proteins and peptides which have clinically been identified are investigated in this study using RRM. The study is divided into 12 groups and the results presented in section 7.4The groups are: 1. Circumsporozoite (CSP) Protein and Importin α-3 The principal Plasmodial sporozoite surface protein employed for attachment and other interactions with host proteins has been identified as CSP [248; 251]. The CSP has been acknowledged to bind effectively to Importin α-3, a binding interaction is found to be abrogated when the 9 amino acids
151
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
residues, which constitute the Nuclear Localization Signal (NLS) is removed [248]. Interaction between CSP and Importin -3 are computationally investigated in this study. To help predict interaction between CSP and Importin -3, RRM is engaged. 2. Region 11 of CSP and Laminin γ-1 Amino acids domain analogous to the 18 protein residues (EWSPCSVTCGNGIQ VRIK) have been identified and referred to as CSP Region 11 (CSP R11) [251; 252]. This domain has been found to bind to Laminin γ-1 [248; 253]. Computational investigation into the binding interaction that exist between the 18 amino acids length peptide from the CSP Region 11 and Laminin γ-1 are studied to find out if a strong or weak binding features can be observed. 3. Plasmodium falciparium Erythrocyte Membrane Protein 1 (PfEMP1) and Host Cytoadherence Receptors Plasmodial PfEMPI, which is about 3,000 amino acid-long is associated with structural motifs (domains) such as N-Terminal Segment (NTS), Duffy Binding Like (DBL), Cysteine-Rich Interdomain (CIDR), Transmembrane (TM), and Acidic Terminal Segment (ATS) [250]. Clinically, binding interaction in the Plasmodial PfEMPI has been identified to be sequencecontent-dependent in which specific domain is found to bind to a specific protein [250]. For example, the DBL α−1 is known to bind to Complement Receptor Type 1 (CR1). CIDR1 α is observed to interact with CD36 and CD31 while DBL2 beta adheres to ICAM-1. Protein residues of DBL α−1, CR1, CIDR1 α, CD36, CD31 DBL2 β and ICAM-1 are assessed in an attempt to predict binding interaction between them. This will help authenticate the fact that out of the 3,000 amino acids long PfEMPI, sequence-content-dependent interaction recognized in the clinical setting also applies computationally and strengthen that prediction of protein binding by means of RRM is achievable.
152
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
4. DBPR11 of the Plasmodium vivax and knowlesi and DARC In the P. vivax, ligand-receptor interaction between the Plasmodial protein called Duffy Binding Protein (DBP) found on the merozoites and the host receptor site in the Red Blood Cell (Erythrocyte) called Duffy Antigen Receptor Chemokines (DARC) is recognized to bring about plasmodial invasion of the Erythrocytes and Liver [254; 255]. This interaction has been found to occur between the conserved cysteine-rich Duffy Binding Protein Region 11 (DBPR11) which has about 330 amino acids residues and the DARC [256; 257]. To help find out if prediction of binding interaction between plasmodial DBPR11 and host DARC can be established, RRM is engaged. 5. Duffy Binding-like families of the Plasmodium falciparum Plasmodial invasion by the Plasmodium falciparum requires association of numerous receptor-lignand interactions [258], involving the Duffy Bindinglike families (DBL) and the Reticulocyte Binding Homologues (RH) [254]. The DBL family consist of 5 Plasmodium falciparum homologues of the Plasmodium vivax (Pv) called PvDBP including EBA-175, EBA-140, EBA181, EBL-1 and EBA-165 [259]. Similarly, the Reticulocyte Binding Homogues (PfRH) family encompasses the six homologues of the Plasmodium vivax (Pv) Reticulocyte Binding Proteins (RBP) which include PfRH1, PfRH2a, PfRH2b, PfRH3, PfRH4, PfRH5 [254; 259], [260]. Interaction between the Plasmodial Erythrocyte Binding Antigen (EBAs) and Reticulocyte Binding Homogues (RH) with the host RBC has been identified as Salic Acid independent which is facilitated by the CR1 [261]. RRM technique is applied here so as to find out if interaction between EBA-175, EBA-140, EBA-181, and EBA-165 and PfRH2b, PfRH3, PfRH4, PfRH5 can be predicted using a computational method. 6. Apical Merozoite Antigens (AMAs) and RON proteins AMA-1 an organelle protein engages host Rhoptries Neck (RON proteins) to interact with micronemal protein AMA-1. This interaction results in a formation of a complex called Moving Junction (MJ) [262; 263] - [266]. In Plas-
153
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
modium falciparum, three RONs namely PfRON2, PfRON4 and PfRON5 are found to interact with the Plasmodium falciparum AMA-1 (PfAMA-1) to form Moving Junction. The Moving Junction assists merozoite in attacking the Erythrocytes [265]. Computational approach is engaged here in an attempt to find out if binding interaction between AMA-1 and host PfRON2 and PfRON5 proteins can be predicted. 7. Merozoite Surface Antigen 3 (MSA-3) and Acidic Basic Repeat Antigen (ABRA) Plasmodial Merozoite antigens include Merozoite Surface Antigens 1-5. MSA3 of the Plasmodium falciparum is a well-protected protein residues identified to be involved in the movement of proteins onto the merozoite surface via interaction with the Acidic Basic Repeat Antigen (ABRA), also referred to as Merozoite Surface Protein 9 (MSP 9) [236]. RRM, a bioinformatic approach is engaged here in order to find out if binding interaction between MSA-3 and ABRA can be predicted. 8. Sporozoite Surface Protein (SSP2) and CD8 T Lymphocytes SSP2 has been defined as a TRAP homologue from P.yoelli [5; 267; 268]. Sporozoite Surface Protein 2 (SSP2) is reported to target CD8 T Lymphocytes [233]. Clinical interactions between SSP2 and CD8 as well as with CD4 have been confirmed by field studies carried out in Gambia [269]. In order to find out if binding interaction between SSP2, and host CD4 and CD8 T Lymphocytes can be predicted, RRM is applied. 9. Plasmodial Transmigration Proteins Plasmodial transmigration into the host cells involves Cell Entry and Trasverval processes [244; 270]. Numerous proteins have been implicated in both Plasmodial cell entry and trasverval. They include Sporozoite microneme Protein Essential for cell Trasversal (SPECT) [244; 270], Perforin-like Protein 1 (PLP1) also known as SPECT 2 which is found to embody a perforinlike domain found in mammals called Membrane Attack Complex (MAC) [244], Cell Trasversal protein for Ookinetes, Sporozoites (Celtos) [245], Membrane Attack Ookinete Protein (MAOP) [241], Phospholipases (PL)
154
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
like PLP2, PLP3, PLP4 and PLP5 [246] and the TRAP-Like Protein (TLP) [271; 272]. It has been reported that SPECT-2 shares sequence similarity with both Human Perforin and Complement Receptor Type 9 [244]. Perforin and Complement Receptor Type 9 are spore-forming proteins of the Plasmodial hosts involved in the disruption of the cell membrane of the viruses and micro-organisms [273], UNIPROT [79]. SPECT-2 and Human Perforin and Complement Receptor Type 9 are investigated using RRM in order to find out if binding interaction exist amongst them. 10. The Plasmodial Host Receptors and Intermediates Sporozoite entry into the Parasitophorous Vacuole (PV) is found to have been mediated by at least two hepatocyte surface molecules of the host, the tetraspanin Cluster of Differentiation81 (CD81) and Scavenger receptor B1 (SRB1) [240; 274]. It has also been observed that the sporozoite mutants that lack the migratory capacity are eaten up by CD11b [240]. Some of the parasites that successfully traverse into the bloodstream are also sucked into the lymphatic nodes and phagocytized by the CD11c [240]. While transmigrating some are reported to have been lost in the Kupffer Cells (KCs) by phagocytosis [240]. To identify possible prediction of interaction between any Plasmodial protein and the host CD11c, CD81 and SRB1 using RRM. 11. Inhibitory Activities of Melanoma Growth Stimulatory Activity (MGSA) and IL-8 Chemokines which are found to interact with the DARC are also discovered to have inhibitory activity on the adhesion of the PvDBP to the Erythrocyte Binding Protein (EBP) of the human erythrocytes. This inhibitory activity arises from competitive antagonism which is achieved through a 35 amino acids residues motif. They are therefore regarded as blockers of Red Blood Cell Plasmodial invasion [142] in the Duffy-positive phenotypes [256; 275]. These Chemokines include Interleutin-8 (IL-8) and Melanoma
155
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
growth stimulatory activity (MGSA) [142]. Interaction between Interleutin8 (IL-8), Melanoma growth stimulatory activity (MGSA), DBP and EBP are therefore predicted using RRM. 12. Liver Stage Antigen-3 (LSA-3) and Interferon-gamma Clinical experiments have demonstrated that Interferon-gamma (IFN-γ) inhibit Liver Schizogony of the Plasmodium falciparun [276]. Vaccine preparation using human P.falciparun LSA-3 has demonstrated that IFN-γ provided the protection experienced in the pre-eryhrocyctic stage, suggesting interaction between the LSA-3 peptide and the paratope that produced IFN-γ [276]. Prediction of interaction between LSA-3 and IFN-γ is attempted using RRM.
7.4
Results
RRM is first employed to obtain the position of common biological activity called Consensus Frequency (CF) of the 42 Plasmodial and 32 host proteins. Proteins that share common CF have also been found to bio-recognize and bio-attach [35]. Using this principle, the CF all the Plasmodial proteins are studied against those of the host. The CFs of the Plasmodial and host proteins, which clinically have been identified to interact with host proteins, are first categorized into 12 groups. The results are shared into tables and figures. They are Tab.s 7.1 - 7.4 and Fig.s 7.2 - 7.7. First is the summary of the results, which presents only the proteins in each group, which have clinically been identify to interact and are found to share similar CF (Table 7.1). For example, Plasmodial protein, EBA-181 is found to share similar CF with PfRH2. EBA-181 has CF at 0.097, PfRH2 demonstrated CF at 0.092. The results obtained in this study are further categorized into three groups using their CFs (Tab.s 7.2 - 7.4). Plasmodial and host proteins which have CF within the range of 0.00 to 0.200 are shown in Tab. 7.2. Tab. 7.3 is the list of proteins with CF within the range of 0.201 to 0.300. Proteins with the CF
156
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
Table 7.1: Summary of the computational results from the group of clinically interacting Plasmodial and host proteins. Group Plasmodial CF Host CF 1
CSP
0.359
Importin α 3
0.332
2b
CSP R11
0.368
CD36
0.346
3c
CIDR α
0.459, 0.131
CD31
0.174
4a
DBP
0.133
DARC
0.170
5b
EBA-165
0.214
PfRH4
0.267
5c
EBA-181
0.097
PfRH2
0.092
6a
AMA-1
0.276
PfRON5
0.288
6c
AMA-1
0.276
Interleukin
0.258
9a
SPECT microneme
0.110
Perforin
0.187
9b
SPECT (Trasversal)
0.290
CCR9
0.178
10a
Intermediate SRB1
0.265
10b
Intermediate CD81
0.199
10c
Intermediate CD11c 0.187
in the range of 0.301 to 0.500 are shown in Table 7.4. These results are further discussed in section 7.6. Fig.s 7.1 and 7.2 are the results of the CS of AMA-1, RON complexes of the Plasmodium and Interleukin. Analysis of the results revealed that amongst the proteins listed in Tab. 7.3, AMA-1 (0.276) which is known to interact with RON complexes of the Plasmodium during merozoite invasion [258; 277] share similar CF. The RON of the Plasmodium falciparium PfRON5 has CF of 0.288 (Fig. 7.1 and Table 7.1). AMA-1 is also known to interact with Interleukin 9 [278] and in this study they both share similar CF. The AMA-1 is CF at 0.276 while the CF of Interleukin is at 0.258 Figure (7.2 and Table 7.1). In addition, Fig.s 7.4 and 7.3 are the results of the CS of adhesive domain of the CSP, Importin α-3, the conserved region (EWSQCNVTCGSGIRVRKRK) of the Thrombospondin (TSP) found in the CSP Region 11 (CSPR11) and Laminin γ 1. The results presented in Tab. 7.1 and Fig. 7.3 disclose that the adhesive domain of the CSP, which possesses CF of 0.359, has clinically been identified
157
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
Table 7.2: CF of the clinically interacting Plasmodial and host proteins from 0.000 to 0.200. Protein Consensus Frequency Plasmodial EBA-175
0.180
EBA-140
0.192
HDP
0.135
STARP
0.199
SPATR
0.135
SPECT microneme
0.110
Vivapain 1
0.146
Host CD11c
0.187
DARC
0.170
Eselectin
0.175
LRP12
0.184
PECAM
0.174
Peselectin
0.179
Perforin
0.187
to interact with Importin α-3. Importin α-3 has CF of 0.332. Additionally, the CSPR11 is known to be interact with the Cluster of Differentiation36 (CD36) [279]. Clinically, Laminin γ 1 is found to interact with the CSPR11. That could not be established in this study. However, it is shown that the CF of CSPR11 is 0.368 and that of CD36 is 0.346 (Fig. 7.4) is similar, suggesting binding interaction, though this has not been clinically studied. Additionally, Tab.s 7.2 - 7.4 show Plasmodial and host proteins, which are found to share similar CF though clinical investigations have not been carried out to find out if they interact. They include such as EBA-175 with CF of 0.180 and CD11c, which has CF of 0.187. This is unlike the Plasmodial proteins, Ring-infested Erythrocyte Surface Antigen (RESA) and Spectrin, which have been recognized to interact clinicaly
158
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
Table 7.3: CF of the clinically interacting Plasmodial and host proteins from 0.201 to 0.300. Protein Consensus Frequency Plasmodial AMA-1
0.276
EBA-165
0.214
MSA-4
0.246
Plasmepsin V
0.280
RESA
0.250
SPECT for Trasversal
0.290
Vivapain 4
0.298
Host CD8 alpha
0.290
Complement Receptor 8
0.258
CD11b
0.249
CXCR1
0.300
DARC
0.270
Importin alpha-1
0.260
Interleukin
0.258
Laminin gamma-1
0.284
PfRH4
0.276
PfRON4
0.288
[249; 280]. The CF RESA and that of neither Spectrin-alpha nor Spectrin-Beta subunits have dissimilar CFs (Fig. 7.5). This appears to have arisen as a result of the fact that the specific protein component of the RESA or Spectrin responsible for the interaction is not engaged. This outcome is further discussed in section Clinical and computational assessments of the two proteins that constitute the HIV Envelope proteins (HIV gp160) and its target protein have also preliminarily shown that interaction between them is sequence-content-dependent. HIV gp160 consists of the Surface protein (gp120) and Transmembrane protein (gp41).
159
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
Table 7.4: CF of the clinically interacting Plasmodial and host proteins from 0.301 to 0.500. Protein Consensus Frequency Plasmodial CelTOS
0.398
CSP
0.359
CSPR11
0.368
MSA-1
0.327
PLP5
0.379
SPECT-2
0.347
Vivapain 2
0.378
Host CD36
0.346
CCR4
0.312
Follistain
0.350
HLA DR types 2
0.306
Importin α-3
0.332
Clinically, only HIV surface protein called gp120 is known to interact with the host CD4 [281]. Computational, only HIV gp120 is known to share same CF with the CD4 signifying binding interaction [98]. Neither, the entire parent protein, HIV gp160 nor the other constituent, gp41 is found to share same CF with the CD4 [35]. These findings, shown in Fig.s 7.6 and 7.7 therefore appear to support the fact that sequence-content-dependence in interaction are obtained in both clinical and computational assessments. On the other hand, Ring-infested Erythrocyte Surface Antigen (RESA) and Spectrin, which are known to bind, failed to disclose interaction. This may be explained in terms of the protein residues engaged. It has preliminarily been established that binding interaction amongst these domains is sequence-contentdependent [250]. In an attempt to explain this outcome, the sequences of the three motifs that make up the RESA are obtained and further studied. The complete sequence of RESA (such as obtained from Plasmodium falciparium called isolate
160
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
Figure 7.1: The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) AMA-1 at position 165 or CF=0.276 and (B) Interleukin at position 64 or CF=0.258.
Figure 7.2: The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) the AMA-1 at position 165 or CF=276 and (B) PfRON5 at position 333 or CF=0.288. FC27 / Papua New Guinea with accession no P13830), which has over 1000 protein residues consist of three motifs [79]. They are J, Tandem Repeats 1 and 2. The RRM-based analysis of the RESA is shown on Table 7.5. While Spectrin β is found to have over 2000 protein residues and comprises
161
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
Figure 7.3: The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) CSP Region 11 at position 19 or CF=0.359 and (B) Importin α-3 at position 173 or CF=0.332.
Figure 7.4: The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) CSP Region 11 (conserved region) at position 7 or CF=0.368 and CD36 (B) at position or CF=0.346. of 20 domains, Spectrin α is recognized to have about 2500 protein residues and 27 domains. The analysis of these motifs is carried out in order to identify the domains that interact interact the three domains of the RESA. The result is presented in Table 7.6. Interactions are found to be sequence-content-dependent
162
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
Figure 7.5: The results of the CS analyses showing Amplitude of 1.00 each at the CF by by (A) Ring-infested Erythrocyte Surface Antigen (RESA) at position or CF=0.25 and (B) the Spectrin-B (B) at position 985 or CF=0.382.
Figure 7.6: The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) HIV gp160 at position 155 or CF=0.185, (B) the HIV gp120 at position 18 or CF=0.0354 and (C) the gp41 at position 64 or CF=0.186. [250]. Each motif in the RESA and Spectrin α or β is expected to have different biological characteristics. Interaction between domains are expected to be independent as clinical interactions have been identified ti be sequence-contentdependent [250]. The result are further discussed in section 7.5.
163
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
Figure 7.7: The results of the CS analyses showing Amplitude of 1.00 each at the CF by (A) HIV gp120 at position 18 or CF=0.0354 and (B) host CD4 at position 18 or CF=0.0373. Table 7.5: Results of the Cross-Spectral analysis of RESA showing the Consensus Frequencies of each domain. N is the length of the longest sequence. Domain Peak Position N Consensus Frequency
7.5
J
14
69
0.203
Tandem Repeats 1
7
69
0.102
Tandem Repeats 2
23
183
0.126
Discussions
In this study, 42 Plasmodial and 32 host target proteins are analyzed in this study using the RRM in order to find out if binding interaction can be predicted. Proteins like the adhesive domain of the CSP and the Importin α-3, which have clinically been identified to bind are observed to have interaction. Apart from the clinically identified binding interaction within each group, there are also tendencies that there could be other interactions between the Plasmodial and host protein within each group which has are yet to be clinically identified. Investigation into such should include Plasmodial and host intermediate proteins like EBA-175 with CF of 0.180 and CD11c, which has CF of 0.187. Others include
164
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
EBA-140 which has 0.192 and CD81 with 0.199, as well as AMA-1 with CF of 0.276 and SRB1 with 0.265. Sequence-content-dependence in clinical protein-protein interactions have been recognized [250]. Plasmodial protein, PfEMPI, variant 1 of the strain MC (UNIPROT Identity Q25733) is identified to have over 3,000 sequence-length [79; 250]. This long sequence constitutes the PfEMPI’s numerous groups of shorter sequences that make-up structural motifs. The motifs include N-Terminal Segment (NTS), DBL, CIDR, TM, and Acidic Terminal Segment (ATS) [250]. Clinically, each motif has been found to interact with specific host protein. In essence, it has clinically been found that binding interactions amongst these domains and the host proteins are sequence-content-dependent [250]. For example, the DBL alpha-1 is known to bind to CR1, while the CIDR1 alpha interacts with CD36 and CD31. DBL2 beta adheres to ICAM-1. Also, it has been confirmed that the 170 amino acids length CIDR α binds to the CD36 [282]. In this study, this approach is used to find out if sequence-content-dependent protein-protein interactions can be established computationally. Protein residues extracted from the parent protein PfEMP1, called CIDR α has two distinct frequencies at 0.459 and 0.131. The CF of the CD36 is at 0.174. Using the entire sequence of the PfEMP1, the CF obtained as 0.24, which is different from CIDR α and CD36. This appears to support the fact that proteins should be fragmented into smallest units and analyzed in order to identify the interacting proteins instead of using large parent proteins. This therefore suggest that computational assessment of protein interaction is sequence-content-dependent. Based on the results obtained using RRM technique as shown in Tab.s 7.5 and 7.6, the CFs of the three domains of the RESA are 0.203 (domain J), 0.102 (Tandem Repeats 1) and 0.126 (Tandem Repeats 2). RRM procedure stipulates that proteins which share same or similar CF have common bio-functionality in terms of bio-recognition and bio-attachment [35]. Tab. 7.6 shows that J motif (0.203) shares similar CF with four domains of the Spectrin α namely, domains 2 (0.192), 3 (0.192), 5 (0.200) and 6 (0.210). On the other hand, the Tandem Repeats 1 of the RESA shares similar CF (0.102) with the domain 12 (0.111) of Spectrin β, while Tandem Repeats 2 of the RESA shares similar CF (0.126) with domain 12 (0.111) of Spectrin β. These
165
7. Resonant Recognition Model-based Approach to Investigating Binding Interactions
sugget interactions. However, clinical investigations are required to verify these computational results. Preliminarily, sequence-content-dependence assessment in both clinical and computational assessments have been observed in the HIV envelope proteins (HIV gp160). Hosts’ CD4 are known to only interact with only the HIV surface protein (gp120) clinically [281] and computationally [98]. HIV gp120 and HIV gp41 (Transmembrane) are the components of the HIV gp160. As shown earlier, both HIV gp41 and HIV gp160 do not share similar CF with the host CD4. It is noted in this study that the Plasmodial and host target proteins engaged have clinically been studied as referenced. They are analyzed by using RRM, which is a procedure that have been employed in study of over 1000 proteins [35]. In addition, the ASS engaged, which is called EIIP has also been clinically and computationally derived and deposited in the database [3; 35]. This therefore shows that validation of the technique and the materials utilized in this study is innately acquired.
7.6
Conclusions
Post-genomic Bioinformaticans now face the problem of analyzing millions of protein residues arising from the mutations by micro-organisms and viruses for purposes of therapeutic interventions. This problem is helped by the fact that clinical approaches are labour-intensive, expensive, time-consuming and resourcewasting. Clinical assessments of these millions of protein residues are not feasible. Therefore computational approaches are necessary to streamline and re-organize clinical confirmation. It is also expected to help predict binding interactions, not only in the Plasmodial proteins but in all species. In this study, a Digital Signal Processing technique called RRM was applied to several Plasmodial proteins and the host target proteins in order to predict biofunctionalities of Plasmodial proteins. Using these inherently validated method and materials, it is observed that our computational results correlated with initial clinical findings. For example, the Plasmodial CSP and host Importin α-3, which clinically bind to each other, are observed to computationally interact. However, this is not the same with the Ring-infested Erythrocyte Surface
166
7. Resonant Recognition Model-based Approach to Investigating Protein Binding Interactions
Antigen (RESA) and the Spectrin. This could be as a result of the fact that interactions are known to be sequence-content-dependent. Proteins consist of biologically active domains with specific biological functionalities. There are 20 domains in the Spectrin β and 27 in Spectrin α. There are three motifs in the RESA. Interactions are computationally predicted amongst the J motif of the RESA and four domains in the Spectrin α, though they need to be investigated clinically. The residues responsible for each interaction is therefore the protein residues used in the experiment are not the specific protein involved in the clinical interaction. Clinical assessment of protein-protein interaction are recognized to be sequencecontent-dependent. Using computational approach, the domain of the CIDR α and CD36 are found to share similar CF. This appears to signify binding interaction. However, the parent protein, PfEMP1 is found to have dissimilar CF. This therefore suggest that the computational approach used has demonstrated that protein-protein interaction between α and CD36 are sequence-content-dependent (specific). This has preliminarily been observed between the HIV envelope proteins and the host protein CD4.
167
7. Resonant Recognition Model-based Approach to Investigating Protein Binding Interactions
Table 7.6: Results of the Cross-Spectral analysis of Spectrin α and β showing the Consensus Frequencies of each domain. PP stands for peak position or position of the maximum amplitude; N is the length of the longest sequence; and CF, refers to the Consensus Frequency. Spectrin α CP N CF Spectrin β CP N CF domain 1
4
13
0.308 domain 1
30
109
0.275
domain 2
20
104
0.192 domain 2
10
114
0.088
domain 3
20
104
0.192 domain 3
39
109
0.356
domain 4
32
105
0.305 domain 4
33
106
0.311
domain 5
21
105
0.200 domain 5
24
105
0.229
domain 6
22
105
0.210 domain 6
47
106
0.443
domain 7
32
105
0.305 domain 7
19
107
0.178
domain 8
45
105
0.429 domain 8
52
107
0.486
domain 9
45
105
0.429 domain 9
17
106
0.160
domain 10
21
70
0.300 domain 10
43
105
0.410
SHB
17
60
0.283
domain 11
23
100
0.23
domain 11
52
106
0.490
domain 12
5
105
0.048 domain 12
11
99
0.111
domain 13
15
108
0.139 domain 13
30
106
0.283
domain 14
29
104
0.29
domain 14
29
107
0.271
domain 15
5
106
0.047 domain 15
47
106
0.443
domain 16
31
105
0.295 domain 16
1
106
0.009
domain 17
6
105
0.057 domain 17
35
106
0.330
domain 18
41
108
0.380 Actin-binding 62
274
0.230
domain 19
34
106
0.321 CH 1
24
105
0.229
domain 20
43
105
0.410 CH 2
37
103
0.360
domain 21
23
102
0.226
FE-hand 1
16
36
0.444
FE-hand 2
14
36
0.389
FE-hand 3
12
35
0.343
Ca+ binding 1 3
12
0.25
Ca+ binding 2 3
12
0.25
168
Chapter 8 Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study 8.1
Summary:
Wavelet Transform (WT) Method is another Digital Signal Processing technique which has been employed in the analysis of proteins. It consists of Continuous Wavelet Transform (CWT) and Discrete Transform Wavelet procedures (DWT). Wavelet Transform Method is briefly described in chapter 3 while detailed explanation and demonstration of the Methodology is shown in Chapter 4. In this Chapter, CWT technique is used to identify the connecting peptides between two helices in the HIV gp41 and 1DF5. This is in order to find out if their actual and predicted lengths can be related. Four Hydrophobicity-based amino acid scales including Eisenberg [7], Kyte-Doolittle [8], Fauchere [9] and Wolfenden [10] are engaged. They have preliminarily been used to assess the physiological and conformational characteristics of the helices and connecting
169
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
peptides of proteins. The aim of this study is to identify and proportionate the connecting peptides of two homologous proteins, consisting of two helices that are separated by connecting peptides of varying amino acid lengths. Continuous Wavelet Transform (CWT) technique and four Hydrophobicity-based amino acid scales are utilized. They are HIV gp41 core protein and its crystallographic refinement product, 1DF5. 1DF5 consists of 68 protein residues, which are divided into two helices, and are linked by two amino acids length (G37 and R38) connecting peptide. HIV gp41 Core of an isolate HXB2 has 110 protein residues. It also has two helices, which are separated by a 45 amino acids length connecting peptide. While 1DF5 has two protein residues (37G-38R) linking the two helices, the two helices of its parent compound, HIV gp41 Core are separated by 45 amino acid constituents (37A-81T). Connecting peptides are recognized to be situated at the protein surface and are known to be hydrophilic in nature. As a result, CWT-based analysis using Hydrophobicity-based amino acid scales are envisaged to reveal the connecting peptide at the minimum wavelet coefficient. One single minimum wavelet coefficient is found at positions 38 and 42 of the scalogram, which identified the two protein residues of the connecting peptide belonging to the 1DF5 that is actually positioned at position 37 and 38 of the protein residue. On the other hand, the 45 amino acid-long connecting peptide of the HIV gp41 core protein, which is actually positioned between 37-81 is predicted at positions 37-85 of the scalogram by three minimum wavelet coefficients. Our findings therefore appears to establish a direct relationship between the actual and predicted lengths of the connecting peptides separating HIV gp41 Core protein and 1DF5. However, the procedure is recommended to be improved in order to provide a well organized wavelet coefficients.
8.2
Introduction
Wavelet Transform (WT) is an analytical technique that translates signals such as numerical representations of the protein residues into several groups of coefficients
170
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
in order to reveal the characteristics of the signals at different scales. At low scales, the resolutions are high, details are compressed but clearer unlike when high scales are involved with low resolutions, the features obtained are extensive but dull [38]. WT is therefore recognized as a multi-scale analytic process for analyzing proteins [114]. WT has played major roles in the analysis of proteins. These are enumerated in Chapter 2. Hydrophobicity and Hydropathy oriented amino acid scales have also preliminarily been used to study positions of the helices and connecting peptides in over 200 proteins and peptides, and also to examine the structural and physiological characteristics of proteins [7; 8; 9]. They have also played important roles in the formation of alpha helices, beta strands and connecting peptides [113] and identifying location and position of the helices in 1AL1, such as a synthetic amphiphilic peptide [109]. Though identification of the connecting peptides of 1DF5 has been carried out [109], no study has proportionated the actual and Continuous Wavelet TransformBased predicted lengths of the connecting peptides of HIV gp41 core protein and its crystallographic refinement product, 1DF5. In this study, the protein residues of 1DF5 and HIV gp41 core protein are obtained from the Protein Data Bank (PDB) [80] and analyzed with four Hydrophobicity-based amino acid scales obtained from GenomeNet database [3]. The Hydrophobicity-based amino acid scales are Eisenberg [7], Kyte-Doolittle [8], Fauchere [9] and Wolfenden [10]. This is in order to identify connecting peptides in the 1DF5 and HIV gp41 core protein and further verify if the actual length of the connecting peptides of the two proteins are proportional to the predicted length. The structures of the 1DF5 and HIV gp41 core protein have been studied [283]. The first helix conformation of the 1DF5 consists of same 33 protein residues of the HIV gp41 core protein except for the two mutations (I35S and L36G). The second helix conformation of the 1DF5 is made of 25 amino acids that is almost homologous to the second helix conformation of the HIV gp41 core protein [283]. However, there is an insertion of the Glycine at amino acid position 39 and a mutation T40G, the connecting peptide form the two helices is only two amino acids (G37 and R38) long. On the other hand, its parent compound, HIV gp41 Core consist of 110 amino
171
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
acids components (Fig. 8.1). It has two helices, which are separated by 45 amino acids long (37A-81T) connecting peptide (white background at the middle). It is therefore noted here that the connecting peptide of the HIV gp41 core is longer than that of the 1DF5 by 43 protein residues.
Figure 8.1: Amino acids sequences, showing two residues (37G and 38R) of the connecting peptide in (A) the 1DF5, and 45 residues (37A-81T) in (B) HIV gp41 core, all indicated at the white background at the middle. Based on the study [283] and amino acids sequences of 1DF5 and HIV gp41 core (Fig. 8.1) obtained from the Protein Data Bank (PDB) [80] ), 1DF5 and analyzed. Connecting peptides are known to be hydrophilic in nature [9; 34; 133]. As a result, when the Hydrophilicity-based amino acid scales are engaged, the protein residues of the connecting peptides are identified at the location of the maximum values of the wavelet coefficient. On the other hand, helices are hydrophobic in nature will be displayed at the minimum wavelet coefficient using Hydrophilicitybased amino acid scales [9; 133]. This is contrary to when the Hydrophobicity-based amino acid scale is used. In this case, connecting peptides are identified at the minimum wavelet coefficient while helices will be highlighted at the maximum values of the wavelet coefficient. Preliminarily, it has been recognized that when the connecting peptide is very short, there may be one or two protein residues interval between the peptide and predicted peptides. Additionally, it has been noted that two or more minimum
172
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
wavelet coefficients may occur when the connecting peptide is long [9; 133]. Because connecting peptides of the helices in 1DF5 and HIV gp41 are hydrophilic, they are highlighted at the minimum wavelet coefficient since Hydrophobicitybased amino acid scales are engaged. The positions and length of the region of the minimum wavelet coefficient are, therefore, further analyzed and compared with the actual positions of the connecting peptides in the two proteins. CWT-based analysis, like Fourier Transform-based investigations yields a set of three parts of the wavelet coefficients. They are absolute, imaginary and real parts. The imaginary parts of the wavelet coefficients have been found to be useful in distinguishing the tertiary structures of proteins such as helices and sheets [107]. It is therefore engaged in this study. The findings presented in this experiment reveal that the connecting peptides of the 1DF5 and the HIV gp41 core protein can be predict and proportionated using the four amino acid scales. The results obtained using amino acid scales, Eisenberg [7], Kyte-Doolittle [8], Fauchere [9] and Wolfenden [10] are presented in section 8.4. They demonstrate that the actual lengths of connecting peptides belonging to the 1DF5 and HIV gp41 core protein are predictable using CWT and are proportional to the predicted regions in the scalogram. The methodology engaged, results obtained, discussions made and conclusions drawn from this experiment are described in the subsequent sections.
8.3 8.3.1
Materials and Method Materials
Amino acids sequences of 1DF5 and HIV gp41 core (Fig. 8.1) as well as their structures (Fig. 8.2) are obtained from the Protein Data Bank (PDB) [80]. Tab. 8.1 shows the amino acid scales value of the four of the scales including Eisenberg [7], Kyte-Doolittle [8], Fauchere [9] and Wolfenden [10], which are engaged. The experimental procedures and the results of this study are presented in the subsequent sections.
173
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
Table 8.1: The four Hydrophobicity-based amino acid scales engaged: Eisenberg [7], Kyte-Doolittle [8], Fauchere [9] and Wolfenden [10] Amino Acids Eisenberg [7] Kyte-Doolittle [8] Fauchere [9] Wolfenden [10] Alanine (A)
0.62
1.8
0.62
1.12
Arginine (R)
-2.53
-4.5
-1.37
-2.55
Asparagine (N)
-0.78
-3.5
-0.85
-0.83
Aspartic acid (D)
-0.90
-3.5
-1.05
-0.83
Cysteine (C)
0.29
2.5
0.29
0.59
Glutamine (Q)
-0.85
-3.5
-0.78
-0.78
Glutamic acid (E)
-0.74
-3.5
-0.87
-0.92
Glycine (G)
0.48
-0.4
0.48
1.20
Histidine (H)
-0.4
-3.2
-0.40
-0.93
Isoleucine (I)
1.38
4.5
1.38
1.16
Leucine (L)
1.06
3.8
1.06
1.18
Lysine (K)
-1.5
-3.9
-1.35
-0.80
Methionine (M)
0.64
1.9
0.64
0.55
Phenylalanine (F)
1.19
2.8
1.19
0.67
Proline (P)
0.12
-1.6
0.12
0.54
Serine (S)
-0.18
-0.8
-0.18
-0.05
Threonine (T)
-0.05
-0.7
-0.05
-0.02
Tryptophan (W)
0.81
-0.9
0.81
-0.19
Tyrosine (Y)
0.26
-1.3
0.26
-0.23
Valine (V)
1.08
4.2
1.08
1.13
8.3.2
Continuous Wavelet Transform (CWT)
The first step is the translation of the alphabetic code of the protein residues into numerical sequences using one amino acid scale engaged. This numerical signal is then processed by means of the CWT. The process is detailed in Chapter 3. CWT-based processing of signals yields Wavelet Coefficient, which is the measure of the similarity between the wavelet of the numerical signal and the chosen wavelet. Wavelet Coefficient is therefore defined as a measure of the degree of sim-
174
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
Figure 8.2: Structure of (A) the 1DF5 showing short connecting peptide in between the two helices and (B) HIV gp41 core, displaying a long connecting peptide. ilarity between the wavelet of the numerical signal and the Morlet wavelets [38]. There are several analyzing or mother wavelet. They include Haar, Daubechies, Coiflet, Symlet, Meyer, Morlet and Mexican Hat [116]. However, Morlet wavelet has been identified to be most appropriate for the prediction of active sites of proteins [104; 112] and as a result, it is chosen for this analysis. In the CWT-based analysis, measurement of the degree of similarity between the wavelet of the numerical signal and the chosen wavelet is performed and continually shifted to the right until all the protein residues are covered [38]. The coefficients (degree of similarity) obtained are then represented in a plot referred to as scalogram. In this study, the blue region of the scalogram represents the motif of the protein with minimum wavelet coefficient while the red region represents the domain of the residues with maximum wavelet coefficient [38]. The x-axis of
175
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
the scalogram represents the scale while the y-axis is the position of the protein residues.
8.4
Results
As shown in Fig. 8.1, the entire HIV gp41 core protein of the HXB2 isolate consists of 110 protein residues. The first helix conformation of the HIV gp41 core protein is composed of about 33 residues while the second helix conformation is about 24 residues. In between these helices is a connecting peptide of 45 residues which span from 37-81. Like the HIV gp41 core protein, the first helix conformation of the the 1DF5 is consist of same 33 protein residues. The second helix conformation of the 1DF5 is made of 25 amino acids that are almost homologous to the second helix conformation of the HIV gp41 core [283]. Both helices of the 1DF5 are separated by connecting peptide consisting of only two amino acid-long (G37 and R38). The outcome of the analyses of these two sets of proteins with varying compositions of the amino acids in the connecting peptides separating the two helices of the HIV gp41 core, which is 45 amino acids length, and the 1DF5 with 2 amino acids length are presented in Table A typical scalograms of the 1DF5 obtained by means of the Eisenberg-based amino acid [7] and HIV gp41 core examined using Wolfenden [10] with a descriptor name, WOLR790101 are used to demonstrate the actual and predicted positions of the connecting peptides and shown in Fig. 8.3. Table 8.2: Result of the CWT analysis of the 1DF5 and HIV gp41 using amino acid scales: Eisenberg [7]and Wolfenden [10]. MV: Minimum value Protein AA Scale AA position Identified MV positions No of MVs 1DF5
Eisenberg
37 - 38
38-42
1
HIV gp41
Wolfenden
37 - 81
37-85
3
There is only one minimum wavelet coefficient (blue) in the scalogram of the 1DF5, which consist of 68 protein residues with the the connecting peptide located at two protein residues at positions 37 and 38. Vertically, the entire scalogram is
176
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
Figure 8.3: Eisenberg-based scalogram of (A) the 1DF5 (Scale 2) showing one minimum wavelet coefficient (blue) between position 38 and 42 and (B) Wolfenden-based result of HIV gp41 core (Scale 1), showing three minimum wavelet coefficients (blue) that span through position 35 to 85.
Figure 8.4: Eisenberg-based scalogram of (A) the 1DF5 (Scale 2), demonstrating one minimum wavelet coefficient and (B) HIV gp41 core (Scale 1), showing three minimum wavelet coefficient. divided into 34 parts and each part is 2 units. As shown in Fig. 8.3, the minimum wavelet coefficient for the 1DF5 falls in between 19th-21th parts. This signifies that it is located at 38th-42th positions of the protein residues of the 1DF5. Similarly, in the scalogram of the HIV gp41 core protein, which has 110 protein
177
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
Figure 8.5: Fauchere-based scalogram of (A) the 1DF5 (Scale 2)demonstrating one minimum wavelet coefficient and (B) HIV gp41 core (Scale 1), showing three minimum wavelet coefficients.
Figure 8.6: Kyte and Doolittle-based scalogram of (A) the 1DF5 (Scale 2) and (B) HIV gp41 core (Scale 2) residues, the connecting peptides are found within the 37th-81th positions. The scalogram is also divided into 22 parts and each part is 5 units. The minimum wavelet coefficient is located above the 7th part and about the 17th parts. This appears to signify that the connecting peptides lies between position 35 and 85. The results of the 1DF5 and the HIV gp41 core protein using Eisenberg-based (Fig. 8.3), Fauchere-based (Fig. 8.5), Kyte and Doolittle-based (Fig. 8.6) amino
178
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
acid scales are as shown.
8.5
Discussion
Wavelet Transform has played major roles in the analysis of proteins. It has helped examine proteins structures including the locations and the amino acid contents of the ’hotspot’ of proteins, strands, helices, and connecting peptides [38]. In this study, the connecting peptides separating two sets of homologous helices from two proteins 1DF5 and HIV gp41 core, which have been studied [283] and their sequences and structures deposited in databases UNIPROT [79] and Protein Data Bank (PDB) [80] are analyzed using CWT and four Hydrophobicitybased ASSs (Table tab:AminoAcidsscales). This is in order to predict the positions of connecting peptides. CWT procedure has contributed immensely in use for the study of protein secondary structure [7; 8; 9; 109]. The nature of the procedure and materials employed disclose inherent validation. 1DF5 is a crystallographic product of the HIV gp41 core. Both have two helices separated by connecting peptides. The helices are homologous. However, their connecting peptides consist of varying number of amino acids. While 1DF5 has two amino acid constituents, HIV gp41 core has 45 protein residues. As a result, there are 43 more protein residues in the HIV gp41 core than 1DF5. CWT is applied to study this difference using four Hydrophobicity-based amino acid scales which have preliminarily been employed in the evaluation of the location and the amino acid contents of proteins. The results obtained in this study reveal that the connecting peptides of the 1DF5 at positions 37 and 38 are spotted at 38th-42th positions while those of the HIV gp41 at positions 37-81 are identified at 37th - 85th positions using CWT. They are spotted as minimum wavelet coefficient using the Hydrophobicity-based amino acid scales. While the connecting peptides of the 1DF5, which consist of two amino acid residues are predicted as a single spot, the 45 amino acid length of the HIV gp41 Core is highlighted as three spots in all the amino acid scales used. The findings made in this investigation appear to suggest that the connecting
179
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
peptides of the 1DF5 and HIV gp41 Core are appropriately predicted. This is because, according to the preliminary procedures for the prediction of connecting peptides [9; 133], the predicted peptides may be one or two protein residue interval away from the actual peptides when the connecting peptide is very short. In this experiment, the protein residues of 1DF5 connecting peptide are located at 37 and 38. However, they are predicted at the 38th and 42th positions. The predicted positions of the connecting peptide in the 1DF5 are therefore found to be one amino acid away from actual positions of the N-terminal region and 4 from the C-terminal region. This therefore suggest accurate prediction of the connecting peptide of the 1DF5. In the case of the HIV gp41 core protein, the predicted positions of the connecting peptide seems to be at the actual positions. According to the CWT-based procedure, long connecting peptide may be predicted by means of two or more minimum wavelet coefficients [9; 133]. The results obtained from this experiment demonstrated three minimum wavelet coefficients between positions 37-83. This is about two protein residues away from the N-terminal region while the C-terminal region is exactly at same position. Therefore, the actual position of the connecting peptides of the HIV gp41 core protein (37-81), which is the predicted by means of three minimum wavelet coefficients found at positions 37-83 also suggest precise prediction. The results obtained in this study therefore appear to demonstrate that proportionality can be established between the actual and Continuous Wavelet TransformBased predicted lengths of the HIV gp41 core protein and its crystallographic refinement product, 1DF5 using four Hydrophobicity-based amino acid scales.
8.6
Conclusion
Wavelet Transform offers a more detailed analysis of the biological functionalities of protein because it provides multi-scale examinations. As a result, it has extensively been employed in the study physiological and structural characteristics of proteins. In this study, CWT procedure is used to identify the connecting peptides of the 1DF5 and HIV gp41 core protein, and further find out if the amino acid
180
8. Continuous Wavelet Transform-Based Prediction of Connecting Peptides: HIV gp41 Core Protein and 1DF5 as a Case Study
lengths of proteins are proportional to the length of their minimum value wavelet coefficient. The proteins are HIV gp41 core and 1DF5 and 4 amino acid scales are engaged. The entire process demonstrates intrinsically validated methodology and materials. The outcome of our investigation revealed that 1DF5 and HIV gp41 core protein can be accurately predicted using CWT. The outcome is encouraging such that this approach has shown great potential of applying in similar investigations. It encourages further use of the CWT procedure in the identification of the secondary protein structures.
181
Chapter 9 Conclusions 9.1
Introduction
Clinical approaches to assessing biological functionalities of proteins and peptides as a means of understanding disease pathogenesis, and as such, discovering, designing and developing cure are known to be saddled with time-wasting, resourceconsuming and labour-intensive laboratory experimentations. Post-genomic Bioinformaticians are now besieged by ever-increasing vast deposit of proteomic and genomic information that needs to be investigated. Knowledge of this huge data on proteins and peptides will help understand the disease processes, discover associated biological functionalities, design and develop drugs, vaccines and biomedical devices. Bioinformatics approaches to studying the Bio-functionalities of these proteins and peptides using sequence information and amino acid scales have become necessary. In this research, Resonant Recognition Model (RRM), Informational Spectrum Method (ISM) and Continuous Wavelet Transform (WT) Method are applied on the HIV, Plasmodial and their host proteins in order to study interactions that exist amongst them as a step towards understanding the disease progression, designing and developing therapeutic interventions. No generated protein residues are engaged. They have all been clinically examined. The methodologies and the Amino Acid Scales (ASSs) applied as well as all protein residues used have been validated using clinical and computational
182
approaches. This therefore depicts the inherent validity of the procedures employed. This chapter highlights the relevance and applicability of the findings made in this research to the present day use. It correlates preliminary clinical findings with the computationally derived results in order to present in-depth implications and potential applications of our findings in the modern medical and pharmaceutical practices. The chapter thereafter concludes by considering studies that require improvement in future. The implication and potential applications of our findings in the modern medical and pharmaceutical practices are presented as follows:
9.1.1
HIV Progression to AIDS
Several causes have been attributed to the transformation of HIV to AIDS. They include decline of the immune response, increase in the HIV replication rate, ability to induce syncytium, and infect tumour cell line. However, the role played by the increase in the affinity of the HIV-1 T-tropic viruses that dominate the late stage of the HIV infection as a result of mutations has not been addressed. In order to understand the mechanism of the HIV progression to AIDS, this role is examined using Resonant Recognition Model (RRM). The degree of affinity between the host CD4+ T cells and the gp120 from the HIV-1 Macrophage tropics (M-tropic) viruses, HIV-1 T cell lymphocyte loving (T-tropic) viruses, as well as the isolates of HIV-2 and Simian Immunodeficiency Virus (SIV) are evaluated. As shown in Figure 9.1, out of the four classes of HIV and SIV namely, HIV-1 T-tropic and M-tropic viruses, HIV-2 and SIV, only the HIV T-tropic viruses are found to bind effectively to the CD4+ T cells. This is because, they share similar Consensus Frequency (CF) with the CD4+ T cells (Table 9.1). According to the principle of Resonant Recognition Model, proteins, which share common CF are known to bind. They have maximum amplitude value of 1.0 at the CF, which symbolizes 100% binding to the CD4+ T cells. Other classes are found to have weak affinity for the CD4+ T cells. A study of one isolate from each class further demonstrates that the HIV Ttropic viruses bind effectively to the CD4+ T cells. Figure 9.2 shows that HXB3
183
Figure 9.1: Percentage affinities by the four classes of HIV and SIV proteins to CD4+ T cells from the class of HIV-1 T-tropic viruses has highest binding capability of 96.55% when compared with YBF30, which belongs to the HIV-1 M-tropic virus, and has 20.45% affinity for the CD4+ T cells. Additionally, HIV-2 isolate, Ghana-1 has 40.45% binding interaction with the CD4+ T cells, while CPZ GAB1, a SIV has 37.82%. As earlier stated, the HIV T-tropic viruses are known to dominate the late stage of HIV/AIDS when the CD4+ T cells is greatly depleted. They are recognized to have more amino acids alterations than the M-tropic viruses which control the viral population at the asymptomatic and sero-conversion stage. Transformation of the M-tropic viruses into the T-tropic viruses has been associated with mutations in the V3 domain of the HIV gp120 belonging to the M-tropic viruses [152]. As HIV multiplies rapidly, it eats up the CD4+ T cells and as a result, the CD4+ T cells depletes. CD4+ T cells is the only source of nutrients for the HIV. It is speedily diminished as the virus rapidly replicates. In order to survive, HIV engages in activity that is proposed in this research to be responsible for the HIV transforms into AIDS.
184
Figure 9.2: Affinities of exemplary isolate from each class to the CD4+ T cells It was observed that during the transitional phase, equilibrium appears to be achieved by the fast multiplying HIV and the fast depleting CD4+ T cells. CD4+ T cells is the viral major source of nutrient and as the CD4+ T cells dwindles, and HIV M-tropic strains appear to undergo some mutational changes in the V3 and other domains. These changes bring about transformation into T-tropic viruses. This empowers them to sustain their grip on the continually dwindling CD4+ T cells. As a result, there is a continued HIV replication and destruction of CD4+ T cells and immune defence. The clinical signs and symptoms of immune breakdown resulting from HIV invasion is defined as AIDS. This finding has help understand the pathogenesis of the HIV/AIDS disease as it elucidates the role played by the increase in affinity between the HIV T-tropic viruses for the CD4, which is caused by mutations in the V3 region of the HIV gp120 appears to have helped HIV infection translate into AIDS disease. It has also provided insight into the designing and developing of effective therapeutic interventions and management plan for HIV/AIDS. It prescribes the development and application of drugs, preventive devices and vaccines that would target the V3 motifs of the HIV at the earliest stage of the HIV infection. This is vital since initiation of HIV/AIDS treatment is known to preserve immune function
185
Table 9.1: Consensus Frequencies of the four classes of HIV and SIV proteins Class of Protein Consensus Frequency CD4+ T cells of Host
0.0373
HIV and SIV gp120
0.0354
HIV-1 T-tropic viruses
0.0354
HIV-1 M-tropic viruses
0.1045
HIV2 Isolates
0.3714
SIV Isolates
0.2362
and reduce transmission.
9.1.2
Characterization and Identification of the HIV Tropic and Phenotypic Associations
In developing therapeutic interventions, it is pertinent to first understand the physiological properties of the organisms and viruses including their Tropic and Phenotypic associations. In this study, a Digital Signal Processing-based Bioinformatics method called Resonant Recognition Model was applied to the HIV surface protein (HIV gp120) from 53 isolates of HIV and SIV, and the target protein CD4 from 25 hosts. This is in order to clarify their Tropic and Phenotypic associations, and also identify relationships between and amongst the viruses and host organisms. The result is found to be interesting. This is because 14 out of the 19 HIV-1 T-tropic viruses studied here are found to possess more than 50% affinity for the CD4. On the other hand, 7 of the 12 HIV-1 M-tropic viruses demonstrated less than 50% affinity for the CD4. Clinically, HIV-T Tropic viruses are known to have high level of virulence and pathogenicity [162]. M-tropic viruses are also recognized to demonstrate low virulence and are less pathogenic [155; 162]. They infect the CD4 using CCR5 [197]. Our findings in this study therefore demonstrated that these clinical findings can be predicted using computationally methods.
186
Besides, it is interesting to observe that the HIV MFA isolate, which is clinically known to be most virulent, cytopathogenic (greatest potential to cause damage or disease to cells) [168] also maintained the highest affinity to the CD4. Both clinical and computational results show strong correlation, which demonstrates that the destructive nature of the MFA can be computationally determined. In addition, our findings that HIV-2 and SIV isolates demonstrated weak affinity for CD4 is in accord with already experimented clinical findings. HIV-2 and SIV isolates are recognized as non-progressors [196] and CD4-independent as they are found to circumvent the gp120-CD4 binding. They form a distinct conformation that enables contact with a Transmembrane co-receptors [196]. At proteomic level, this findings further strengthen the already established findings that Human and Chimpanzee share similar biological characteristics. This is because they demonstrate common point of highest affinity (see Table 9.2). The spectral characteristics of both organisms share same position of maximum interaction. Similar results are obtained in other species like Green, Dancing and pig-tailed monkey as shown in Table 9.3. The results of the HIV isolates termed BH10, HXB3, and HXB2 and its clone MFA (Table 9.3) demonstrated common point of highest affinity. This finding is found to be in accord with the preliminary phylogenetic studies as they are recognized to have emanated from the same stock [172; 215]. Cross Atlantic transmission of the HIV/AIDS was also observed as OYI and Z6, which belong to the Gabonese and Zairian stocks [170; 284] share same point of highest affinity with the American isolate, CDC-451 [173]. In the same manner, another American isolates SC, a Zairian stock, WMJ1 and Cameroonian isolate referred to as 96CM-MP535 have the point of highest affinity (Tab.s 9.2). In addition, ape-to-human cross-species transmission was strengthened as the VI850 which belongs to the HIV-1 (human) group M [188] is found to share the same position of highest affinity with a SIV (chimpanzee) called MB66 (Table 9.3). OYI and KB-ETR with the highest affinity for the CD4 amongst the unclassified group, 87.51% and 67.19%, respectively are predicted to belong to the T-tropic viruses with SI capacity. ESimilarly, 96CM-MP535 and MAL with the least affinity, 12.36% and 29.21% are suggested to be grouped amongst the M-tropic viruses with NSI capacity
187
Table 9.2: Maximum amplitude-based prediction of relationships in HIV and SIV isolates across Atlantic. Points of Maximum Affinity HIV Isolates and Hosts Suggested Relationship 152 (f=0.299)
SC, WMJ1, 96CM-MP535 Trans-Atlantic Transmission
155 (f=0.305)
Z6, OYI, CDC-451
Trans-Atlantic Transmission
Table 9.3: Maximum amplitude-based prediction of relationships in HIV and SIV isolates, and the host species. Points of Interaction HIV Isolates and Hosts Relationship 18 (f=0.0354)
HXB3, HXB2, BH10, MFA
CD4 candidate interactor
150 (f=0.295)
MAL, ELI, Z2
Zairian stock
158 (f=0.311)
VI850 (HIV), MB66 (SIV)
Ape to Human cross-species
101 (f=0.210)
Green, Dancing & Pig-tailed monkeys Same Species
68 (f=0.141)
Human, Chimpanzee
9.1.3
Same origin
Bioinformatics Approach to Assessing Drug Resistance
Using Informational Spectrum Method (ISM), appropriate amino acid scales and sequence information of the HIV target proteins for five anti-HIV/AIDS agents, the resistance offered by the five anti-HIV/AIDS agents are assessed without involving the resource-wasting and time-consuming clinical experimentations. The anti-HIV/AIDS agents are Raltegravir, an Integrase Inhibitor; Protease Inhibitor known as Darunavir; a Nucleotide Reverse Transcriptase called Lamivudine; Fusion Inhibitor known as Enfuvirtide and Bevirimat, a Maturation Inhibitor. In this experiment, protein residues of the consensus sequences are obtained from the Stanford Database [221].The protein residues of both susceptible and resistant strains are then constructed based on literature and analyzed. It is disclosed in this study that the protein residues of the susceptible strains attain the maximum activities at the point of interaction unlike the resistant strains. Figure 9.3 is a comparison between the drug activities of the susceptible and resistant strains demonstrated by Enfuvirtide, Darunavir and Lamuvudine.
188
Figure 9.3: Pharmacological activities of the susceptible (blue) and resistant (red) strains of the HIV target proteins exposed to (1) Enfuvirtide, (2) Lamivudine, (3) Darunavir, (4) Bevirimat and (5) Raltegravir. All susceptible strains show 100% activity while the resistant strains demonstrated less activity. As shown in the figure, the resistant strains exposed to Enfuvirtide demonstrated more than 2% resistance while Darunavir, about 10% and Lamivudin, more than 5% reduced susceptibility. Bevirimat and Raltegravir displayed 6.48% and 3.16% resistance respectively. Drug resistance assessment using sequence information of proteins has become a new and interesting area of study that will lead to investigation into drug hypersensitivity, discordant outcomes, and effects of compensatory mutations. Application of the Drug resistance calculation on antiviral, anti-bacterial, antimalarial and anti-retroviral agents has become a rational means of assessing of therapeutic indices or potency of drugs.
189
9.1.4
Prediction of Binding Interactions In Plasmodial and Host Proteins
As shown in Table 9.4, computational approaches to assessing the binding interaction between proteins using Plasmodial and host proteins reveal that binding interaction existing between the Plasmodial and its host proteins can be predicted. Using RRM, the adhesive domain of the Circumsporozoite (CSP) and Importin α 3, which are clinically verified to bind [248] and are found to biorecognize and bio-adhere. This is because the CSP, which has CF of 0.359 appears to have shared similar CF with the Importin α which has CF of 0.332. This characteristic, according to the RRM principle signifies binding interaction. Another Plasmodial protein called AMA-1, with CF of 0.276, which is known to clinically interact with human Interleukin 9, which also has its CF at 0.258 [278] are found to interact using computational methods. It is also noted to bind to the RON complexes of the tacyzoite from gondii, and plasmodial merozoite (CF=0.288) [258; 277]. In this experiment, they appear to bio-recognize, bioadhere. Additionally, the conserved region of the TSP in the CSP Region 11 (CSPR11) with CF at 0.368, which is identified to be responsible for the interaction with the Cluster of Differentiation36 (CD36) that has CF at 0.346 [279] are found to share same point of interaction. This symbolizes that they bind to each other. Notwithstanding, interactions such as between the Ring-infested Erythrocyte Surface Antigen (RESA), which has CF of 0.24 and any of the two form of Spectrin, which are clinically ascertained [249; 280] could not be established. This is because the complete sequence of RESA, Spectrin α or β are long and consist of several domains with various biological functionalities. RESA has over 1000 protein residues consist of three motifs including J, Tandem Repeats 1 and 2 [79]. Spectrin β also has over 2000 protein residues and comprises of 20 domains [79]. Spectrin α has about 2500 protein residues and 27 domains [79]. Analysis of all the domains from both Plasmodial and host proteins may reveal the domains that interact. It has been noted that clinically, protein-protein interaction is sequence-contentdependent [250]. Plasmodium falciparium Erythrocyte Membrane Protein 1
190
(PfEMP1) has about 3,000 residues that constitute the N-terminal Segment (NTS), Duffy Binding Like (DBL), Cysteine-Rich Interdomain (CIDR), Transmembrane (TM), Acidic Terminal Segment (ATS). Each motif is known to clinically interact with different protein. For example, CIDR α is known to interact with the CD31. Computationally, it has also been observed that assessment of interaction is also sequence-content-dependent [35; 98]. This is because HIV Envelope Protein, which embodies the Surface Protein (gp120) and the Transmembrane protein gp41 is not found to interact with the host CD4. However, the HIV Surface Protein (gp120) is observed to share same point of interaction with the CD4 signifying interaction. It is also suggested in this experiment that CIDR α interact with CD31. This is because CIDR α appears share similar (0.131) with the CD31 (0.174), which according to RRM proposes binding interaction. This finding also appeared to suggest that computational approaches strengthen clinical findings regarding the fact that protein-protein interactions are also sequence-content-dependent. Since both clinical interactions and computational assessments are sequencecontent-dependent, it is necessary that exact protein residues be engaged in order to obtain the desired results. Table 9.4: Correlation of the clinical and computational Relationships. Plasmodial CF Host CF Clinical Computational CSP 0.359 Importin alpha 3 0.332 Bind Bind AMA-1 0.276 RON 0.288 Bind Bind AMA-1 0.276 Interleukin 9 0.258 Bind Bind CSP Region 11 0.368 CD36 0.346 Bind Bind RESA 0.25 Spectrin0.382 Do not Bind
9.1.5
Continuous Wavelet Transform-Based Study of the Connecting Peptides of HIV gp41 and 1DF5
The Continuous Wavelet Transform-based study of the connecting peptides with varying length of protein residues from two proteins, 1DF5 and HIV gp41 Core
191
Protein separating two sets of homologous helices reveals that the connecting peptide of both 1DF5 and HIV gp41 Core Protein can be identified. Four Hydrophobicity-based amino acid scales obtained from the database [3], which have preliminarily been engaged to study protein primary structures are used. The result shown in Table 9.5, appears to suggest that the connecting peptide of the 1DF5, which is at 37G and 38R yields one single wavelet coefficient local minimum value between 38-42 position. Also, the 45 amino acids length of the connecting peptide belonging to the HIV gp41 Core Protein, which is actually located at positions 37 - 81 of the protein residues seems to be presented as three minimum values of the wavelet coefficients that occupy 37-85th positions in the scalogram. Morlet wavelet is engaged. This is because it is known to be most suitable wavelet detecting of active sites in the protein residues [104; 112]. These results are obtained using Eisenberg-based amino acid scale for the 1DF5, and Wolfenden-based scale for the HIV gp41 Core Protein. This appears to demonstrate that the connecting peptides appear to be accurately detected. The lengths of the region at which they are detected in the scalogram also seem to be in proportion with the actual length of the connecting peptide. This result is interesting as it appears to encourage further use of Continuous Wavelet Transform procedures for protein structure studies. This is because the method engaged appears to appropriately identified the connecting peptides of the 1DF5 and HIV core proteins. This therefore seems to confirm the reliability of the technique and also encourages future application in other species. Table 9.5: Result of the CWT analysis of the 1DF5 and HIV gp41 using amino acid scales: Eisenberg and Wolfenden . MV: Minimum Value Protein A A Scale Actual Position Detected MV Position No of MVs 1DF5
Eisenberg
37 - 38
38-42
1
HIV gp41
Wolfenden
37 - 81
37-85
3
192
9.2
Future work
Computational approaches are effectively complementing the labour-intensive, slow and expensive manual techniques. It is evident that in future, Bioinformatics procedures will efficiently augment clinical methods. Computational devices constitute part of the instruments in industries including Hospitals, Pharmaceutical industries, and other non-medical institutions. In line with the findings presented in this thesis, the following studies are worth considering for future improvement. • 1. Prediction of the Potencies of Therapeutic Agents: This rational approach has become necessary. This part of the study is demonstrated in this research using two anti-HIV/AIDS drugs. They are Enfirtuvide and Sifirtuvide, and also starter materials for the anti-Malaria vaccine design and development called P18 and P32. It is essential these exercises be repeated in future with other agents to make room for the transition to computerization of these Pharmaceutical assays. • 2. Resonant Recognition-based large scale predictions of binding interactions: Using Plasmodial proteins, RRM based prediction of biological interactions is established. Proteins that interact clinically are found to share similar spectral frequency. Extension of this process to proteins in future will help the Digital Signal Processing-based predictions of biological interactions. Also, the study recommends that future work should look into the Resonant Recognition-based large scale predictions of binding interactions amongst proteins, with consideration on the sequence-content-dependence of the prediction. Large scale predictions of binding interactions amongst proteins have preliminarily involved Domain Motif Interaction from Structural Topology (D-MIST) [285], and Motif-Motif Interaction from Structural Topology (M-MIST) [286].
193
9. Conclusions
• 3. Drug Resistance Assessment: Resistance arising from exposing HIV target proteins to five classes of antiretroviral is assessed in this research using a Signal Processing-based approach called Informational Spectrum Method (ISM), which incorporated the sequence information and one amino acid scale. One amino acid scale is used. It has been identified that more than one mechanism of action, hence more amino acid scale is involved in one mutation. As a result, it is recommended that in future, all amino acids scales be identified and used to obtain a complete resistance. Consideration should also be given to application of the technique to other organisms. • 4. Continuous Wavelet Transform Approach to Detecting Protein Secondary Structures: Continuous Wavelet Transform approach is utilized in this research to identify the connecting peptide linking the two helices of the HIV gp41 Core Protein and its crystallographic product, the 1DF5. The positions and proportionality in the length of the protein residues and the wavelet coefficient are established. There is need to engage this technique to other proteins. However, the procedure needs to be improved such that the wavelet coefficients are streamlined. In addition, it is recommended that further investigations be carried out using the amino acids scale Kyte and Doolittle [8]. In addition, Continuous Wavelet Transform also need to improved such that wavelet coefficients can only be demonstrated clearly and not in clusters. Finally, the research work described in this thesis advocates replacement of clinical approaches with computational procedures, and therefore, the prediction of biological functionalities. It further recommends that application of these computational approaches to novel target proteins may lead to the designing and development of therapeutic interventions especially to the recalcitrant causative organisms for HIV/AIDS and Tuberculosis. Computer-Aided drug resistance computation is one novel aspect of the research that is interesting. With a Pharmacy background, it is exciting for me to know that interactions in the body as well as drug resistance can be computed using sequence information. —————————————————————— —-
194
References [1] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE, “The protein data bank,” Nucleic Acids Res., vol. 28(1), pp. 235–242, 2000. xi, 4, 55 [2] Larosa GJ, Davide JP, Weinhold K, Waterbury JA, Profy AT, Lewis JA, Langlois AJ, Dreesman GR, Boswell RN, Shadduck P, Holley LH, Karplus M, Bolognesi DP, Matthews TJ, Emini EA, Putney SD, “Conserved sequence and structural elements in the hiv-1 principal neutralizing determinant,” Science, vol. 249, pp. 932–935, 1990. xiii, 4, 78, 80 [3] Tomii K, Kanehisa M, “Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins,” Protein Engineering, vol. 9(1), pp. 27–36, 1996. xxiii, 5, 13, 29, 43, 97, 110, 126, 127, 128, 130, 138, 166, 171, 192 [4] Wilce MC, Aguilar MI, Hearn MT, “Physicochemical basis of amino acid hydrophobicity scales: evaluation of four new scales of amino acid hydrophobicity coefficients derived from rp-hplc of peptides,” J Anal Chem, vol. 67, pp. 1210–1219, 1995. xxiii, 5, 40, 127, 129 [5] Robson KJH, Naita S, Barker G, Sinden RE, Crisanti A, “Cloning and expression of the thrombospondin-related adhesive protein gene of plasmodium berghei,” Molecular and Biochemical Parasitology, vol. 84, pp. 1–12, 1997. xxiii, 5, 127, 149, 154 [6] Collman R, Balliet JW, Gregory SA, Friedman H, Kolson DL, Nathanson N, Srinivasan A., “An infectious molecular clone of an unusual macrophage-
195
REFERENCES
tropic and highly cytopathic strain of human immunodeficiency virus type 1,” Journal of Virology, vol. 66(12), pp. 7517–7521, 1992. xxiii, 5, 78, 82, 83, 94, 95, 96, 106, 107, 116, 127 [7] Chen H, Gu F, Liu F, “Predicting protein secondary structure using continuous wavelet transform and chou-fasman method,” Conf Proc IEEE Eng Med Biol Soc., vol. 2005(3), pp. 2603–6, 2005. xxiv, 5, 36, 169, 171, 173, 174, 176, 179 [8] Kyte J, Doolittle RF, “A simple method for displaying the hydropathic character of a protein,” J Mol Biol., vol. 157(1), pp. 105–132, 1982. xxiv, 5, 40, 55, 169, 171, 173, 174, 179, 194 [9] Qiu J, Liang R, Zou X, Mo J, “Prediction of protein secondary structure based on continuous wavelet transform,” Talanta, vol. 61, pp. 285–293, 2003. xxiv, 5, 36, 57, 58, 169, 171, 172, 173, 174, 179, 180 [10] S. C. Wolfenden RV, Cullis PM, “Water, protein folding, and the genetic code,” Science, vol. 7206(4418), pp. 575–577, 1979. xxiv, 5, 27, 30, 169, 171, 173, 174, 176 [11] Ban TA, “The role of serendipity in drug discovery,” Dialogues Clin Neurosci., vol. 8(3), pp. 335–344, 2006. 5 [12] Schmidt B, Ribnicky DM, Poulev A, Logendra S, Cefalu WT, Raskin I, “A natural history of botanical therapeutics,” Metabolism Clinical and Experimental, vol. 57 (Suppl 1), pp. S3–S9, 2008. 5 [13] W. O. F. A. H. (OIE), OIE International Standards on Antimicrobial Resistance. 2008. 5, 6, 23, 24 [14] W. O. F. A. H. (OIE), Laboratory Methodologies for Bacterial Antimicrobal Susceptiblity Test’. 2008. 5, 6, 23, 24 [15] G. C. Garrett GH, Biochemistry. Casebound, 2009. 5 [16] Krogsgaard-Larsen P, Stromgaard K, Madsen U, Textbook of Drug Design and Discovery. CRC Press/Taylor & Francis, 2009. 5
196
REFERENCES
[17] Cavalieri SJ, Harbeck RJ, McCarter YS, Ortez JH, Rankin ID, Sautter RL, Sharp SE, Spiegel CA, Manual of Antimicrobial Susceptibility Testing. American Society for Microbiology., 2005. 6, 23 [18] Day M, Vane JR, “An analysis of the direct and indirect actions of drugs on the isolated guinea-pig ileum,” Brit. J. Pharmacology, vol. 20, pp. 150–170, 1963. 6, 23 [19] Lasonder E, Janse CJ, van Gemert G, Mair GR, Vermunt AMW, Douradinh BG, van Noort V, Huynen MA, Luty AJF, Kroeze H, Khan SM, Sauerwein RW, Waters AP, Matthias Mann M, Stunnenberg HG, “Proteomic profiling of plasmodium sporozoite maturation identifies new proteins essential for parasite development and infectivity,” PLoS Pathog, vol. 4(10), p. e1000195, 2008. 6, 24, 149 [20] Ayala SC, “Checklist, host index, and annotated bibliography of plasmodium from reptiles,” Journal of Eukaryotic Microbiology, vol. 25 (1), pp. 87– 100, 1978. 6 [21] Zumla A, “Reflection & reaction: Drugs for neglected diseases,” THE Lancet Infectious Diseases, vol. 2, p. 393, 2002. 6 [22] Amico P, Aran C, Avila C, “Hiv spending as a share of total health expenditure: An analysis of regional variation in a multi-country study,” PLoS ONE, vol. 5(9), p. e12997, 2010. 6 [23] Panel on Antiretroviral Guidelines for Adults and Adolescents, “Guidelines for the use of antiretroviral agents in hiv-1-infected adults and adolescents,” Department of Health and Human Services, p. 1166, 2011. 7, 32 [24] Piche A, “Gene therapy for hiv infections: Intracellular immunization,” Can J Infect Dis., vol. 10(4), pp. 307–312, 1999. 7 [25] Catteruccia F, “Malaria vector control in the third millennium: progress and perspectives of molecular approaches,” Pest Manag Sci, vol. 63, pp. 634–640, 2007. 7
197
REFERENCES
[26] Trape JF, “The public health impact of chloroquine resistance in africa,” Am. J. Trop. Med. Hyg., vol. 64, pp. 12–17, 2001. 7 [27] Langford SE, Ananworanich J, Cooper DA, “Predictors of disease progression in hiv infection: a review,” AIDS Res Ther., vol. 4, pp. 1–11, 2007. 7, 77 [28] Garcia JE, Puentes A, Patarroyo ME, “Developmental biology of sporozoite-host interactions in plasmodium falciparum malaria: implications for vaccine design,” Clin Microbiol Rev., vol. 19(4), pp. 686–707, 2006. 7 [29] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ Higgins DG, “Clustalw and clustalx version 2,” Bioinformatics, vol. 23(21), pp. 2947–294., 2007. 8, 25 [30] Katoh K, Misawa K, Kuma K, Miyataa T, “Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform,” Nucleic Acids Res., vol. 30(14), pp. 3059–3066, 2002. 8, 25, 31 [31] Edgar RC, “Muscle: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Research, vol. 32(5), pp. 1792–1797., 2004. 8, 25, 31 [32] Notredame C, Higgins DG, Heringa J, “T-coffee: A novel method for fast and accurate multiple sequence alignment,” J Mol Biol., vol. 302(1), pp. 205–217, 2000. 8, 25, 31 [33] Veljkovic V, Cosic I, Dimitrijevic B, Lalovic D, “Is it possible to analyze dna and protein sequence by the method of digital signal processing,” IEEE Trans Biomed Eng., vol. 32(5), pp. 337–341, 1985. 8, 26, 27, 31, 40, 125, 131, 137 [34] Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J, Molecular Cell Biology. W. H. Freeman and Company., 2000. 8, 27, 172
198
REFERENCES
[35] Cosic I, “Macromolecular Bioactivity: Is It Resonant Interaction between Macromolecules?-Theory and Applications,” IEEE Transactions on Biomedical Engineering, vol. 41(I), pp. 1101–1114, 1994. 9, 10, 13, 18, 27, 29, 30, 32, 33, 34, 35, 36, 40, 41, 48, 50, 51, 52, 80, 81, 82, 84, 87, 95, 97, 101, 102, 104, 110, 125, 150, 156, 160, 165, 166, 191 [36] Veljkovic V, Niman HL, Glisic S, Veljkovic N, Perovic V, Muller PC, “Identification of hemagglutinin structural domain and polymorphisms which may modulate swine h1n1 interactions with human receptor,” BMC Structural Biology, vol. 9(62), pp. 1–11, 2009. 10, 18, 31, 32, 33, 35, 40, 43, 52, 102, 137 [37] Doliana R, Veljkovic V, “Emilins interact with anthrax protective antigen and inhibit toxin action in vitro,” Matrix Biology, vol. 27, pp. 96–106, 2008. 10, 18, 30, 32, 33, 34, 102 [38] Misiti M, Misiti Y, Oppenheim G, Poggi J, Wavelet Toolbox User’s Guide. The MathWorks, 1996. 10, 32, 33, 35, 37, 44, 45, 47, 54, 171, 175, 179 [39] Lieberman-Blum SS, Fung HB, Bandres JC, “Maraviroc: a ccr5-receptor antagonist for the treatment of hiv-1 infection,” Clin Ther., vol. 30(7), pp. 1228–1250, 2008. 14, 116 [40] Hunt JS, Romanelli F, “Maraviroc, a ccr5 coreceptor antagonist that blocks entry of human immunodeficiency virus type 1,” Pharmacotherapy, vol. 29(3), pp. 295–304, 2009. 14, 116 [41] Almecija S, Moya-Sola S, Alba MD, “Early origin for human-like precision grasping: A comparative study of pollical distal phalanges in fossil hominins,” PLosone, vol. 5(7), pp. 11727–11737, 2010. 16, 120 [42] Alberch P, Gale EA, “A developmental analysis of an evolutionary trend: Digital reduction in amphibians,” Evolution, vol. 39(1), pp. 8–23, 1985. 17 [43] Pirogova E, Akay M, Cosic I, “Investigating the interaction between oncogene and tumour suppressor protein,” IEEE Transactions on Information
199
REFERENCES
Technology in Biomedicine, vol. 31(1), pp. 10–15, 2009. 18, 34, 36, 37, 38, 54, 80, 82, 95, 102 [44] Cosic I, Nesic D, “Prediction of hot spots in sv40 enhancer and relation with experimental data,” Eur. J. Biochem., vol. 170, pp. 247–252, 1987. 18, 34, 36, 38, 102 [45] de Trad C H, Fang Q Cosic I, “The resonant recognition model (rrm) predicts amino acid residues in highly conserved regions of the hormone prolactin (prl),” Biophysical Chemistry, vol. 84(2), pp. 149–157, 2000. 18, 33, 34, 36, 102 [46] Pettit FK, Bare E, Tsai A, Bowie JU, “Hotpatch: A statistical approach to finding biologically relevant features on protein surfaces,” J Mol Biol., vol. 369(3), pp. 863–879, 2007. 18 [47] Chang DT, Weng YZ, Lin JH, Hwang MJ, Oyang YJ, “Protemot: prediction of protein binding sites with automatically extracted geometrical templates,” Nucleic Acids Res., vol. 34(Web Server issue), pp. W303–309, 2006. 18 [48] Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A, “Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling,” PLoS Comput Biol., vol. 5(1), p. e1000267, 2009. 18 [49] C. PS, Practical application of computer-aided drug design. Marcel Dekker Ltd, 1997. 23, 26, 150 [50] K. L. Finn PW, “Computational approaches to drug design,” Algorithmica, vol. 25(2), pp. 347–371, 1999. 23, 26, 150 [51] P. EP, ed., Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data; Approved Guideline. NCCLS, 2000. 24 [52] Andrews JM, “Determination of minimum inhibitory concentrations,” J Antimicrob Chemother., vol. 48(1), pp. 5–16, 2001. 24
200
REFERENCES
[53] Miller K, “Nih working definition of bioinformatics and computational biology,” Biomedical Computation Review, p. 1, 2000. 24, 26, 27 [54] International Human Genome Sequencing Consortium, “Finishing the euchromatic sequence of the human genome,” NATURE, vol. 2004, pp. 931– 945, 431. 24 [55] Pellegrini M, “Computational methods for protein function analysis,” Current Opinion in Chemical Biology, vol. 5, pp. 46–50, 2001. 25 [56] Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P, Molecular Biology of the Cell” 4th edition. Garland Science, New York, 2002. 25 [57] Chuaqui RF, Bonner RF, Best CJ, Gillespie JW, Flaig MJ, Hewitt SM, Phillips JL, Krizman DB, Tangrea MA, Ahram M, Linehan WM, Knezevic V, Emmert-Buck MR, “Post-analysis follow-up and validation of microarray experiments,” Nat Genet., vol. 32 Suppl, pp. 509–14, 2002. 25 [58] Veljkovic V, Metlas R, “Identification of nanopeptide from htlv-iii, arv-2 and lavbru envelope gp120 determining binding to t4 cell surface protein,” Cancer Biochem Biophys., vol. 10(2), pp. 91–106, 1988. 26, 101, 137 [59] Anastassiou D, “Genomic signal processing,” IEEE Signal processing Magazine, vol. 18, pp. 8–20, 2001. 26 [60] Luscombe NM, Greenbaum D, Gerstein M, “What is bioinformatics? a proposed definition and overview of the field,” Methods Inf Med., vol. 40(4), pp. 346–358, 2001. 27 [61] Mckee T, Mckee JR, Biochemistry: The Molecular Basis of Life. Oxford University Press, 2006. 27, 29 [62] Walther B, Sieber R, “Bioactive proteins and peptides in foods,” Int J Vitam Nutr Res., vol. 81(2-3), pp. 181–192, 2011. 27 [63] Hancock REW, “Peptide antibiotics,” Lancet, vol. 349, pp. 418–422, 1997. 27, 29
201
REFERENCES
[64] Lazar GA, Marshall SA, Plecs JJ, Mayo SL, Desjarlais JR, “Designing proteins for therapeutic applications,” Current Opinion in Structural Biology, vol. 13, pp. 513–518, 2003. 28 [65] Storici P, Toss1 A, Lenarcic B, Romeo D, “Purification and structural characterization of bovine cathelicidins, precursors of antimicrobial peptides,” Eur. J. Biochem., vol. 238, pp. 769–776, 1996. 28 [66] Thomma BPHJ, Cammue BPA, Thevissen K, “Plant defensins,” Planta, vol. 216, pp. 193–202, 2002. 28, 29 [67] Hancock REW, Chapple DS, “Peptide antibiotics,” Antimicrobial Agents And Chemotherapy, vol. 43(6), pp. 1317–1323, 1999. 27, 28, 29 [68] He Y, Xiao Y, Song H, Liang Q, Ju D, Chen X, Lu H, Jing W, Jiang S, Zhang L, “Design and evaluation of sifuvirtide, a novel hiv-1 fusion inhibitor,” J Biol Chem., vol. 283(17), pp. 11126–11134, 2008. 29, 69, 71 [69] Koletzko B, Goulet O, Hunt J, Krohn K, Shamir R, “Guidelines on paediatric parenteral nutrition of the european society of paediatric gastroenterology, hepatology and nutrition (espghan) and the european society for clinical nutrition and metabolism (espen), supported by the european society of paediatric research (espr),” Journal of Pediatric Gastroenterology and Nutrition, vol. 41, pp. 1–87, 2005. 29 [70] Hoj L, Kjaer J, Winther O, Cozzi-Lepri A, Lundgreen DJ, “In silico identification of physicochemical properties at mutating position positions relevant to reducing susceptibility to amprenavir,” XVII International HIV Drug Resistance Workshop, vol. Poster No.113, 2008. 29, 32, 40, 130, 139 [71] Nair AS, Sreenadhan SP, “A coding measure scheme employing electron-ion interaction pseudopotential (eiip),” Bioinformation, vol. 1(6), pp. 197–202, 2006. 29 [72] Frohlich H, “Long-range coherence and energy storage in biological systems,” J. Quantum Chem., vol. II, pp. 641–649, 1968. 29, 30
202
REFERENCES
[73] Veljkovic V, Slavic I, “Simple general-model pseudopotential,” Phys. Rev. Lett. 29, 105107 (1972), vol. 29, pp. 105–107, 1972. 29, 30 [74] Veljkovic V, “The dependence of the fermi energy on the atomic number,” Phys Lett., vol. 45A, pp. 41–42, 1973. 29, 30 [75] Stretton AOW, “The first sequence: Fred sanger and insulin,” Genetics, vol. 162, pp. 527–532, 2002. 30 [76] Bandeira N, Pham V, Pevzner P, Arnott D, Lill JR, “Beyond edman degradation: Automated de novo protein sequencing of monoclonal antibodies,” Nat Biotechnol., vol. 26(12), pp. 1336–1338, 2008. 30 [77] Kelly RP, Palumbi SR, “General-use polymerase chain reaction primers for amplification and direct sequencing of enolase, a single-copy nuclear gene, from different animal phyla,” Molecular Ecology Resources, vol. 9, pp. 144– 147, 2009. 30 [78] Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW, “Genbank,” Nucleic Acids Res., vol. 37(Database issue), pp. D26–D31, 2009. 31 [79] Jain E, Bairoch A, Duvaud S, Phan I, Redaschi N, Suzek BE, Martin MJ, McGarvey P, Gasteiger E, “Infrastructure for the life sciences: design and implementation of the uniprot website,” BMC Bioinformatics, vol. 10, pp. 136–154, 2009. 12, 31, 59, 81, 102, 103, 151, 155, 161, 165, 179, 190 [80] Bernstein FC, Koetzle TF, Williams GJ, Meyer EE Jr., Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M, “The protein data bank: A computer-based archival file for macromolecular structures,” J. of. Mol. Biol., vol. 112 (1977), p. 535, 1977. 31, 171, 172, 173, 179 [81] Hulo C, de Castro E, Masson P, Bougueleret L, Bairoch A, Xenarios I, Le Mercier P, “Viralzone: a knowledge resource to understand virus diversity,” Nucleic Acids Res. (England), vol. 39 (Database issue), pp. D576–582, 2011. 31
203
REFERENCES
[82] Bader GD, Betel D, Hogue CWV, “Bind: the biomolecular interaction network database,” Nucleic Acids Res., vol. 31, pp. 248–250, 2003. 31 [83] Sigrist CJA, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N, “Prosite, a protein domain database for functional characterization and annotation,” Nucleic Acids Res., vol. 38(Database issue), pp. 161–166, 2010. 31 [84] Altschul SF, Gish W, Miller W, Myers EW,Lipman DJ, “Basic local alignment search tool,” J. Mol. Biol., vol. 215, pp. 403–410, 1990. 31 [85] L. D. Pearson WR, “Improved tools for biological sequence comparison,” PNAS, vol. 85(8), pp. 2444–2448, 1988. 31 [86] Apweiler R, Martin MJ, O’Donovan, Pruess M, “Managing core resources for genomics and proteomics,” Pharmacogenomics, vol. 4(3), pp. 343–350, 2003. 31 [87] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ and Higgins DG, “Clustalw and clustalx version 2,” Bioinformatics, vol. 23(21), pp. 2947–2948., 2007. 31 [88] Smith SW, The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Publishing, 2002. 31, 32, 33, 35, 36, 40, 43, 44, 45, 46, 48, 50, 84 [89] Cosic I, Nesic D, Pavlovic M, Williams R, “Enhancer binding proteins predicted by informational spectrum method,” Biophys. Biochem. Res. Comm., vol. 141, pp. 831–839, 1986. 31, 33 [90] Cosic I, Hearn MTW, “Studies on protein-dna interactions using the resonant recognition model,” Eur. J. Hiochem., vol. 205, pp. 613–619, 1992. 31 [91] Cosic I, Pirogova E, “Bioactive peptide design using the resonant recognition model,” Nonlinear Biomedical Physics, vol. 1(1), pp. 7–17, 2007. 32
204
REFERENCES
[92] Nwankwo N, Seker H, “A signal processing-based bioinformatics approach to assessing drug resistance: Human immunodeficiency virus as a case study,” Proc. of IEEE EMBS, vol. 2010, pp. 1836–1839, 2010. 32, 35, 125, 137 [93] Veljkovic V. and Veljkovic N, “Characterization of conserved properties of hemagglutinin of h5n1 and human influenza viruses: possible consequences for therapy and infection control,” BMC Structural Biology, vol. 9(21), pp. 1–11, 2009. 32, 34, 40, 43, 52, 102, 137 [94] Ho YS, Abecasis AB, Theys K, Deforche K, Dwyer DE, Charleston M, Vandamme AM, Saksena NK, “Hiv-1 gp120 n-linked glycosylation differs between plasma and leukocyte compartments,” Virol J., vol. 5, pp. 14–23, 2008. 32, 40 [95] Cosic I, Pavlovic M, Vojisavljevic V, “Prediction of hot spots in interleukin2 based on informational spectrum characteristics of growth regulating factors,” Biochimie, vol. 71, pp. 333–342, 1989. 33 [96] Cosic I, Nesic D, Pavlovic M, Williams R, “Enhancer binding proteins predicted by informational spectrum method,” Biophys. Biochem. Res. Comm., vol. 141, pp. 831–839, 1986. 33 [97] Pirogova E, Cosic I, “Application of signal processing in computational analysis of hsps interactions,” Proc. of the second IASTED Intl Conf on Biomedical Engineering, vol. 2004, pp. 345–351, 2004. 34, 80, 102 [98] Cosic I, “Analysis of hiv proteins using dsp techniques,” Proc. of IEEE EMBS, vol. 2002(3), pp. 2886 – 2889, 2001. 34, 36, 54, 80, 82, 86, 95, 101, 102, 104, 114, 121, 160, 166, 191 [99] Vaidyanathan PP, “Genomics and proteomics: A signal processors tour,” Proc. of IEEE Circuits and Systems Society, vol. 4(4), pp. 6 – 29, 2004. 34, 50, 80
205
REFERENCES
[100] Lio P, “Wavelets in bioinformatics and computational biology: state of art and perspectives,” Bioinformatics Review, vol. 19(1), p. 29., 2003. 35, 37, 53 [101] Ramachandran P, Antoniou A, Vaidyanathan PP, “Identification and location of hot spots in proteins using the short-time discrete fourier transform,” Pro. of IEEE Signals, Systems and Computers, vol. 2004(2), pp. 1656–1660, 2004. 36 [102] Murray KB, Gorse D, Thornton JM, “Wavelet transforms for the characterization and detection of repeating motifs,” J. Mol. Biol., vol. 316, pp. 341–363, 2002. 36, 37, 54 [103] Prabhakaran M, “The distribution of physical, chemical and conformational properties in signal and nascent peptides,” Biochem J., vol. 269(3), pp. 691– 6, 1990. 36, 40, 53 [104] Rao KD, Swamy MNS, “Analysis of genomics and proteomics using dsp techniques,” IEEE Transactions on Circuitts and Systems, vol. 55(1), pp. 370–378, 2008. 36, 38, 54, 175, 192 [105] Jishnu S PG, “Wavelet analysis of coding and noncoding regions of dna sequences,” National Conference on Technological Trends, vol. 2009, pp. 338– 342, 2009. 36 [106] Hirakawa H, Muta S, Kuhara S, “The hydrophobic cores of proteins predicted by wavelet analysis,” Bioinformatics, vol. 4, pp. 141–148, 1999. 36 [107] Mandell AJ, Selz KA, Shlesinger MF, “Wavelet transformation of protein hydrophobicity sequences suggests their membership in structural families,” Physica A, vol. 244, pp. 254–262., 1997. 36, 173 [108] Lio P, Vannucci M, “Wavelet change-point prediction of transmembrane proteins,” Bioinformatics, vol. 16, pp. 376–382, 2000. 36 [109] Pattini L, Cerutti S, “Hydrophobicity analysis of protein primary structures to identify helical regions,” Methods Inf Med, vol. 43, pp. 102–105, 2004. 36, 37, 55, 57, 171, 179
206
REFERENCES
[110] Dodin G, Vandergheynst P, Levoir P, Cordier C, Marcourt L, “Fourier and wavelet transform analysis, a tool for visualizing regular patterns in dna sequences,” J. Theoret. Biol., vol. 206, pp. 323–326, 2000. 36 [111] E. J. T. P. Tsonis A A, Kumar P, “Wavelet analysis of dna sequences,” Phys. Rev. E, vol. 53, pp. 1828–1834, 1996. 37 [112] Cosic I, Fang Q, “Evaluation of different wavelet constructions (designs) for analysis of protein sequences,” DSP, vol. 2002, pp. 1117–1120, 2002. 37, 38, 175, 192 [113] Eisenberg D, Weiss RM, Terwilliger TC, “The hydrophobic moment detects periodicity in protein hydrophobicity,” Proc Natl Acad Sci, vol. 81, pp. 140– 144, 1984. 37, 55, 171 [114] Daubechies I, Ten Lectures on Wavelets. SIAM, 1992. 37, 53, 171 [115] de Trad CH, Fang Q, Cosic I., “Protein sequence comparisons based on the wavelet transform approach,” Protein Eng., vol. 15(3), pp. 193–203., 2002. 37, 38, 80, 82, 95 [116] Amara G, “An introduction to wavelets,” IEEE Computational Science & Engineering, vol. 2(2), pp. 50–61, 1995. 38, 54, 175 [117] Walse VA, Hattotuwagama CK, Doytchinova IA, Wong M, Macdonnald IK, Mulder A, Claas FHJ, Pellegrino P, Turner J, Williams I, Turnbull EL, Borrow P, Flower DR, “Integrating in silico and in vitro analysis of peptide binding affinity to hla-cw*0102: a bioinformatic approach to the prediction of new epitopes,” Plosone, vol. 4(11), p. e8095, 2009. 39, 41 [118] Hopp TP, Woods KR, “Prediction of protein antigenic determinants from amino acid sequences,” Proc Natl Acad Sci U S A, vol. 78(6), pp. 3824–3828, 1981. 40 [119] P. M. T. J. G. E. Kuhn LA, Swanson CA, “Atomic and residue hydrophilicity in the context of folded protein structures,” Proteins, vol. 23(4), pp. 536– 547, 1995. 40
207
REFERENCES
[120] Koehl P, Levitt M, “Structure-based conformational preferences of amino acids,” PNAS, vol. 96(22), pp. 12524–12529, 1999. 40 [121] Kumar S, Tsai CJ, Nussinov R, “Factors enhancing protein thermostability,” Protein Eng., vol. 13(3), pp. 179–191, 2000. 40 [122] Andersen CAF, Brunak S, “Representation of protein-sequence information by amino acid subalphabets,” AI Magazine, vol. (25)1, pp. 97–104, 2004. 43 [123] Fernandez L, Caballero J, Abreu JI, Fernandez M, “Amino acid sequence autocorrelation vectors and bayesian-regularized genetic neural networks for modeling protein conformational stability: Gene v protein mutants,” PROTEINS: Structure, Function, and Bioinformatics, vol. 67, p. 834852, 2007. 43 [124] Bajic VB, Bajic IV, “Some problems in application of information spectrum method and resonant recognition model for cross-spectral analysis of dna/rna sequences,” Proc. of Communications and Signal Processing, vol. 1998, pp. 219–224, 1998. 44 [125] Chrysostomou C, Seker H, Aydin N, “Effects of windowing and zeropadding on complex resonant recognition model for protein sequence analysis,” EMBS, EMBC, pp. 4955 – 4958, 2011. 46 [126] Cochrane JH, Time Series for Macroeconomics and Finance. Spring, 1997. 47, 84 [127] Cooley JW, Tukey JW, “An algorithm for the machine calculation of complex fourier series,” Mathematics Computation, vol. 19, pp. 297–301, 1965. 48 [128] Cosic I, Hearn MT, “Hot spot’ amino acid distribution in ha-ras oncogene product p21: relationship to guanine binding site,” J Mol Recognit., vol. 4(2-3), pp. 57–62, 1991. 50
208
REFERENCES
[129] Nwankwo N, Seker H, “Assessment of the binding characteristics of human immunodeficiency virus type 1 glycoprotein120 and host cluster of differentiation4 using digital signal processing,” BIBE, vol. 2010, pp. 289–290, 2010. 52, 114 [130] Viari A, Soldano H, Ollivier E, “A scale-independent signal processing method for sequence analysis,” Comput. Appl. Biosci., vol. 6, pp. 71–80, 1990. 52 [131] Kiyota T, Lee S, Sugihara G, “Design and synthesis of amphiphilic alphahelical model peptides with systematically varied hydrophobic-hydrophilic balance and their interaction with lipid- and bio-membranes,” Biochemistry, vol. 35(40), pp. 13196–13204, 1996. 53 [132] Pattini L, Riva L, Cerutti S, “A wavelet based method to predict the alpha helix content in the secondary structure of globular proteins,” Proc. of IEEE-EMBS, vol. 2002, pp. 132 – 133, 2002. 55 [133] Leszczynski JF, Rose GD, “Loops in globular proteins: A novel category of secondary structure,” Science, vol. 234(4778), pp. 849–855, 1986. 57, 58, 172, 173, 180 [134] Puentes A, Garcia J, Vera R, Lpez R, Suarez J, Rodriguez L, Curtidor H, Ocampo M, Tovar D, Forero M, Bermudez A, Cortes J, Urquiza M, Patarroyo ME, “Sporozoite and liver stage antigen plasmodium falciparum peptides bind specifically to human hepatocytes,” Vaccine, vol. 22, pp. 1150– 1156, 2004. 59, 149 [135] Chatterjee S, Wery M, Sharma P, Chauhan VS, “A conserved peptide sequence of the plasmodium falciparum circumsporozoite protein and antipeptide antibodies inhibit plasmodium berghei sporozoite invasion of hepg2 cells and protect immunized mice against p. berghei sporozoite challenge,” Infection and Immunity, vol. 63(11), pp. 4375–4381, 1995. 59, 66 [136] Pinzon-Ortiz C, Friedman J, Esko J, Sinnis P, “The binding of the circumsporozoite protein to cell surface heparan sulfate proteoglycans is re-
209
REFERENCES
quired for plasmodium sporozoite attachment to target cells,” J Biol Chem., vol. 276(29), pp. 26784–26791, 2001. 63, 66 [137] Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M, “Aaindex: amino acid index database,” Nucleic Acids Res., vol. 36(Database issue):D, pp. 202–205, 2008. 63, 66 [138] Cerami C, Frevert U, Sinnis P, Takacs B, Clavijo P, Santos MJ, Nussenzweig V, “The basolateral domain of the hepatocyte plasma membrane bears receptors for the circumsporozoite protein of plasmodium falciparum sporozoites,” Cell, vol. 70, pp. 1021–1033, 1992. 66 [139] Frevert U, Galinski MR, Hugel F, Allon N, Schreier H, Smulevitch S, Shakibaei M, Clavijo P, “Malaria circumsporozoite protein inhibits protein synthesis in mammalian cells,” The EMBO Journal, vol. 17(14), pp. 3816–3826, 1998. 66, 149 [140] Spek E J, Bui A H, Lu M, Kallenbach N R, “Surface salt bridges stabilize the gcn4 leucine zipper,” Protein Sci., vol. 7, pp. 2431–2437, 1998. 69 [141] Marqusee S, Baldwin R L, “Helix stablization by glu lysine salt bbidge in short peptides of de no design,” PNAS, vol. 84, pp. 8898–8902, 1987. 69 [142] Chitnis CE, Sharma A, “Targeting the plasmodium vivax duffy-binding protein,” Trends in Parasitology, vol. 24(1), pp. 29–34, 2007. 74, 75, 155, 156 [143] Dong J, Ye P, Schade AJ, Shan Gao, Romo GM, Turner NT, McIntire LV, Lopez JA, “Tyrosine sulfation of glycoprotein iba: Role of electrostatic interactions in von willebrand factor binding,” The Journal of Biological Chemistry, vol. 276(20), pp. 16690–16694, 2001. 74 [144] Smith PL, Dalgleish A, “Subtle mimicry of hla by hiv-1 gp120 a role for anti hla antibodies?,” The Open Autoimmunity Journal, vol. 2, pp. 104–116, 2010. 77
210
REFERENCES
[145] Bandera A, Ferrario G, Saresella M, Marventano I, Soria A, Zanini F, Sabbatini F, Airoldi M, Marchetti G, Franzetti F, Trabattoni D, Clerici M, Gori A, “Cd4+ t cell depletion, immune activation and increased production of regulatory t cells in the thymus of hiv-infected individuals,” PLoS One, vol. 5(5), pp. e10788–e10795, 2010. 77, 79 [146] Corbeil J, Richman DD, “Productive infection and subsequent interaction of cd4-gp120 at the cellular membrane is required for hiv-induced apoptosis of cd4+ t cells,” J. Gen Virol., vol. 76, pp. 681–690, 1995. 77 [147] Lernmark A, “Autoimmune diseases: are markers ready for prediction?,” J Clin Invest., vol. 108(8), pp. 1091–1096, 2001. 77 [148] Karlsson A, Parsmyr K, Sandstrm E, Fenyo EM, Albert J, “Mt-2 cell tropism as prognostic marker for disease progression in human immunodeficiency virus type 1 infection,” J. of Clinical Microbiol., vol. 32(2), pp. 364–370, 1994. 77, 78 [149] Pantaleo G, Graziosi C, Fauci AS, “New concepts in the immunopathogenesis of human immunodeficiency virus infection,” N Engl J Med., vol. 328(5), pp. 327–335, 1993. 77 [150] C.-P. R. W. R. Nagy K, Clapham P, “Human t-cell leukemia virus type i: Induction of syncytia and inhibition by patients’ sera,” International Journal of Cancer, vol. 32(3), p. 321328, 1983. 78 [151] Doranz BJ, Rucker J, Yi Y, Smyth RJ, Samson M, Peiper CS, Parmentier M, Collman RG, Doms RW, “A dual-tropic primary hiv-1 isolate that uses fusin and the b-chemokine receptors ckr-5, ckr-3, and ckr-2b as fusion cofactors,” Cell, vol. 85, pp. 1149–1158, 1996. 78, 82, 96, 101, 106 [152] Ribeiro RM, Hazenberg MD, Perelson AS, Davenport MP, “Naive and memory cell turnover as drivers of ccr5-to-cxcr4 tropism switch in human immunodeficiency virus type 1: Implications for therapy,” Journal of Virology, vol. 80, pp. 802–809, 2006. 78, 79, 96, 184
211
REFERENCES
[153] Lusso P, “Hiv and the chemokine system: 10 years later,” The EMBO Journal, vol. 25, pp. 447–456, 2006. 79 [154] Polzer S, Dittmar MT, Schmitz H, Schreiber M, “The n-linked glycan g15 within the v3 loop of the hiv-1 external glycoprotein gp120 affects coreceptor usage, cellular tropism, and neutralization,” Virology, vol. 304, pp. 70– 80, 2002. 78, 79 [155] Callaway DS, Ribeiro MR, Nowak MA, “Virus phenotype switching and disease progression in hiv-1 infection,” Proc. R. Soc. Lond., vol. 266, pp. 2523– 2530, 1999. 78, 79, 82, 95, 101, 106, 115, 116, 117, 186 [156] Chen JJ, Cloyd MW, “The potential importance of hivinduction of lymphocyte homing of lymph nodes,” International Immunology, vol. 11(10), pp. 1591–1594, 1999. 78, 79 [157] Blaak H, Vant-Wout AB, Brouwer M, Hooibrink B, Hovenkamp E, Schuitemaker H, “In vivo hiv-1 infection of cd45ra+ cd4+ t cells is established primarily by syncytium-inducing variants and correlates with the rate of cd41 t cell decline,” PNAS, vol. 97(3), pp. 1269–1274, 1999. 78, 102 [158] Dejucq N, Simmons G, Clapham RP, “T-cell line adaptation of human immunodeficiency virus type 1 strain sf162: effects on envelope, vpu and macrophagetropism,” Journal of General Virology, vol. 81, pp. 2899–2904, 2000. 79, 94, 96, 101, 103, 107, 116 [159] Regoes RR, Bonhoeffer S, “The hiv coreceptor switch: a population dynamical perspective,” TRENDS in Microbiology, vol. 13, pp. 269–277, 2005. 79 [160] Kinter A., Catanzaro A., Monaco J., Ruiz M., Justement J., Moir S., Arthos J., Oliva A., Ehler L., Mizell S., Jackson R., Ostrowski M., Hoxie J., Offord R. and Fauci A S., “Cc-chemokines enhance the replication of t-tropic strains of hiv-1 in cd41 t-cells: Role of signal transduction,” Proc. Natl. Acad. Sci. USA, vol. 95, pp. 11880–11885, 1998. 82, 106, 107
212
REFERENCES
[161] Hirsch I, de Mareuil J, Salaun D, Chermann JC, “Genetic control of infection of primary macrophages with t-cell-tropic strains of hiv-1,” Virology, vol. 219, pp. 257–261, 1996. 82, 106 [162] Chowdhury HI, Bentsman G, Choe W, Potash JM, Volsky JD, “The macrophage response to hiv-1: Intracellular control of x4 virus replication accompanied by activation of chemokine and cytokine synthesis,” Journal of NeuroVirology, vol. 8, pp. 599–610, 2002. 82, 83, 93, 101, 104, 106, 108, 114, 115, 117, 118, 186 [163] Pollakis G, Abebe A, Kliphuis A, Chalaby M I, Bakker M, Mengistu Y, Brouwer M, Goudsmit J, Schuitemaker H, Paxton W A, “Phenotypic and genotypic comparisons of ccr5- and cxcr4-tropic human immunodeficiency virus type 1 biological clones isolated from subtype c-infected individuals,” Journal of Virology, vol. 78(6), pp. 2841–2852, 2004. 82, 106, 107 [164] Miller ED, Duus KM, Townsend M, Yi Y, Collman R, Reitz M, Su L, “Human immunodeficiency virus type 1 iiib selected for replication in vivo exhibits increased envelope glycoproteins in virions without alteration in coreceptor usage: Separation of in vivo replication from macrophage tropism,” Journal of Virology, vol. 75(18), pp. 8498–8506, 2001. 82, 83, 104, 106, 107, 114, 115, 117, 118 [165] Sakaida H, Hori T, Yonezawa A., Sato A, Isaka Y, Yoshie O, Hattori T, Takashi Uchiyama T, “T-tropic human immunodeficiency virus type 1 (hiv1)-derived v3 loop peptides directly bind to cxcr-4 and inhibit t-tropic hiv-1 infection,” Journal of Virology, vol. 72(12), pp. 9763–9770, 1998. 82, 106, 107 [166] Wong JK, Ignacio CC, Torriani F, Havlir D, Fitch NJ, Richman DD, “In vivo compartmentalization of human immunodeficiency virus: Evidence from the examination of pol sequences from autopsy tissues,” Journal of Virology, vol. 71(3), pp. 2059–2071, 1997. 82, 106 [167] Chehimi J, Prakash K, Shanmugam V, Collman R, Jackson SJ, Bandyopadhyay S,Starr SE, “Cd4-independent infection of human peripheral dendritic
213
REFERENCES
cells with isolates of human immunodeficiency virus type 1,” Journal of General Virology, vol. 74, pp. 1277–1285, 1993. 82, 106 [168] Stevenson M, Haggerty S, Lamonica C, Mann AM, Meier C, Wasiak A, “Cloning and characterization of human immunodeficiency virus type 1 variants diminished in the ability to induce syncytium-independent cytolysis,” Journal of Virology, vol. 64(8), pp. 3792–3803, 1990. 82, 106, 113, 114, 187 [169] Ivey-Hoyle M, Culp JS, Chaikin MA, Hellmig BD, Matthews TJ, Sweet RW, Rosenberg M, “Envelope glycoproteins from biologically diverse isolates of immunodeficiency viruses have widely different affinities for cd4,” PNAS, vol. 88, pp. 512–516, 1991. 82, 106 [170] Myers G., Korber B.,Wain-Hobson S., Jeang K-T., Henderson L. E. and Pavlakis G. N., Human Retroviruses and AIDS 1992: A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences. Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM., 1992. 82, 106, 108, 119, 187 [171] Tokunaga K, Michael L, Greenberg ML, Morse MA, Cumming RIH, Lyerly K, Cullen BR, “Molecular basis for cell tropism of cxcr4-dependent human immunodeficiency virus type 1 isolates,” Journal of Virology, vol. 75(15), pp. 6776–6785, 2001. 82, 92, 95, 106 [172] Gurgo C, Guo HG, Franchini G, Aldovini A, Collalti E, Farrell K, WongStaal F, Gallo RC, Reitz MS Jr., “Envelope sequences of two new united states hiv-1 isolates,” Virology, vol. 164(2), pp. 531–536, 1988. 82, 106, 118, 187 [173] CDC Agent Summary Statement, “Current trends human t-lymphotropic virus type iii/ lymphadenopathy-associated virus,” MMWR, vol. 35(34), pp. 540–542,547–549, 1986. 82, 106, 119, 187 [174] Getchell JP, Heath JL, Hicks DR, Sporborg C, Mcgrath CR, Kalyanaraman VS, “Continuous production of a cytopathic human t-lymphotropic
214
REFERENCES
virus in a permissive neoplastic t-cell line,” Journal of Clinical Microbiology, vol. 23(4), pp. 737–742, 1986. 82, 106 [175] Tanese N, Sodroski J, Haseltine WA, Goff SP, “Expression of reverse transcriptase activity of human t-lymphotropic virus type iii (htlv-iii/lav) in escherichia coli,” Journal of Virology, vol. 59(3), pp. 743–745, 1984. 82 [176] Blumberg BM, Epstein LG, Saito Y, Chen D, Sharer LR, Anand R, “Human immunodeficiency virus type 1 nef quasispecies in pathological tissue,” Journal of Virology, vol. 66(9), pp. 5256–5264, 1992. 82, 106 [177] Xiao L, Owen SM, Goldman I, Lal AA, deJong JJ, Goudsmit J, Lal RB, “Ccr5 coreceptor usage of non-syncytium-inducing primary hiv-1 is independent of phylogenetically distinct global hiv-1 isolates: Delineation of consensus motif in the v3 domain that predicts ccr-5 usage,” Virology, vol. 240, pp. 83–92, 1998. 82, 106, 107 [178] Schols D, Struyf S, Van Damme J, Est JA, Henson G, De Clercq E, “Inhibition of t-tropic hiv strains by selective antagonization of the chemokine receptor cxcr4,” J. Exp. Med., vol. 186(8), pp. 1383–1388, 1997. 83, 107 [179] Takeuchi Y, Hoshino H, Miura T, Hayami M, Clapham P. R, Weiss RA, “Preferential susceptibilities of adherent-cells derived from human brains to an hiv-1 mutant and hiv-2 and siv isolates,” Int Conf AIDS, vol. 7, p. 100 (abstract no. M.A.1035), 1991. 83, 107, 109, 119 [180] Lazdins JK, Klimkait T, Woods-Cook K, Walker M, Alteri E, Cox D, Cerletti N, Shipman R, Bilbe G, McMaster G, “The replicative restriction of lymphocytotropic isolates of hiv-1 in macrophages is overcome by tgfbeta,” AIDS research and human retroviruses, vol. 8(4), pp. 505–511, 1992. 83, 107 [181] Gao F, Robertson DL, Carruthers CD, Morrison SG, Jian B, Chen Y, BarrSinoussi F, Girard M, Srinivasan A, Abimiku AG, Shaw GM, Sharp PM, Hahn BH, “A comprehensive panel of near-full-length clones and reference sequences for non-subtype b isolates of human immunodeficiency virus type 1,” Journal of Virology, vol. 72(7), pp. 5680–5698, 1998. 83, 107
215
REFERENCES
[182] Torre VS, Marozsan AJ, Albright JL, Collins KR, Hartley O, Offord RE, Quinones-Mateu ME, Arts EJ, “Variable sensitivity of ccr5-tropic human immunodeficiency virus type 1 isolates to inhibition by rantes analogs,” Journal of Virology, vol. 74(10), pp. 4868–4876, 2000. 83 [183] Huet T, Kerbarh O, Schols D, Clayette P, Gauchet C, Dubreucq G., Vincent L., Bompais H., Mazinghien R, Querolle O, Salvador A, Lemoine J, Lucidi B, Balzarini J, Petitou M, “Long-lasting enfuvirtide carrier pentasaccharide conjugates with potent anti-human immunodeficiency virus type 1 activity,” Antimicrobial Agents and Chemotherapy, vol. 54(1), pp. 134–142, 2010. 83, 107 [184] Dorr P, Westby M, Dobbs S, Griffin P, Irvine B, Macartney M, Mori J, Rickett G, Smith-Burchnell C, Napier C, Webster R, Armour D, Price D, Stammen B, Wood A, Perros M, “Maraviroc (uk-427,857), a potent, orally bioavailable, and selective small-molecule inhibitor of chemokine receptor ccr5 with broad-spectrum anti-human immunodeficiency virus type 1 activity,” Antimicrobial Agents and Chemotherapy, vol. 49(11), pp. 4721–4732, 2005. 83, 107 [185] Huet T, Dazza MC, Brun-Vzinet F, Roelants GE, Wain-Hobson S, “A highly defective hiv-1 strain isolated from a healthy gabonese individual presenting an atypical western blot,” AIDS, vol. 3(11), pp. 707–715, 1989. 83 [186] Simon F, Mauclre P, Roques P, Loussert-Ajaka I, Mller-Trutwin MC, Saragosti S, Georges-Courbot MC, Barr-Sinoussi F, Brun-Vzinet F, “Identification of a new human immunodeficiency virus type 1 distinct from group m and group o,” Nat Med, vol. 4(9), pp. 1032–1037, 1998. 83, 93 [187] Decker JM, Bibollet-Ruche F, Wei X, Wang S, Levy D. N, Wang W, Delaporte E, Peeters M, Derdeyn CA, Allen S, Hunter E, Saag M. S, Hoxie JA, Hahn BH, Kwong PD, Robinson JE, Shaw GM, “Antigenic conservation and immunogenicity of the hiv co receptor binding site,” J Exp Med., vol. 201(9), pp. 1407–1419, 2005. 83, 108
216
REFERENCES
[188] Barnett SW, Quiroga M, Werner A, Dina D, Levy JA, “Distinguishing features of an infectious molecular clone of the highly divergent and noncytopathic human immunodeficiency virus type 2 uc1 strain,” Journal of Virology, vol. 67(2), pp. 1006–1014, 1993. 83, 108, 187 [189] Grez M, Dietrich U, Balfe P, von Briesen H, Maniar JK, Mahambre G, Delwart EL, Mullins JI, Rubsamen-Waigmann H, “Genetic analysis of human immunodeficiency virus type 1 and 2 (hiv-1 and hiv-2) mixed infections in india reveals a recent spread of hiv-1 and hiv-2 from a single ancestor for each of these viruses,” Journal of Virology, vol. 68(4), pp. 2161–2168, 1994. 83, 108 [190] Brandful JA, Coetzer ME, Cilliers T, Phoswa M, Papathanasopoulos MA, Morris L, Moore PL, “Phenotypic characterization of hiv type 1 isolates from ghana,” AIDS Research and Human Retroviruses, vol. 23(1), pp. 144– 152, 2007. 83, 93, 108 [191] Mosier E, “Distinct rates and patterns of human cd4+ t-cell depletion in hupbl-scid mice infected with different isolates of the human immunodeficiency virus,” Journal of Clinical Immunology, vol. 15(6), pp. 130–133, 1995. 83, 108 [192] Hirsch V, Riedel N, Kornfeld H, Kanki PJ, Essex M, Mullins JI, “Cross-reactivity to human tlymphotropic virus type iii/lymphadenopathyassociated virus and molecular cloning of simian t-cell lymphotropic virus type iii from african green monkeys,” Proc. Natl. Acad. Sci., vol. 83, pp. 9754–9758, 1986. 83, 109 [193] Puffer B A, Ohlmann S, Edinger A L, Carlin D, Sanchez M D, Reitter J, Watry D D, Fox H S, Desrosiers R C, Doms R W, “Cd4 independence of simian immunodeficiency virus envs is associated with macrophage tropism, neutralization sensitivity, and attenuated pathogenicity,” Journal of Virology, vol. 76(6), pp. 2595–2605, 2002. 83, 109 [194] Takehisa J, Kraus MH, Decker JM, Li Y, Keele BF, Bibollet-Ruche F, Zammit KP, Weng Z, Santiago ML, Kamenya S, Wilson ML, Pusey AE, Bailes
217
REFERENCES
E, Sharp PM, Shaw GM, Hahn BH, “Generation of infectious molecular clones of simian immunodeficiency virus from fecal consensus sequences of wild chimpanzees,” Journal of Virology, vol. 81(14), pp. 7463–7475, 2007. 83, 109 [195] Endres MJ, Clapham PR, Marsh M, Ahuja M, Turner JD, McKnight A, Thomas JF, Stoebenau-Haggarty B, Choe S, Vance PJ, Wells TNC, Power CA Sutterwala SS, Doms RW, Landau NR, “Cd4-independent infection by hiv-2 is mediated by fusin/cxcr4,” Cell, vol. 87, pp. 745–756, 1996. 89, 90 [196] Jeffs SA, Shotton C, Balfe P, McKeating JA., “Truncated gp120 envelope glycoprotein of human immunodeficiency virus 1 elicits a broadly reactive neutralizing immune response,” J Gen Virol., vol. 83(Pt 11), pp. 2723–2732, 2002. 89, 90, 102, 104, 113, 116, 117, 187 [197] Lathey JL, Brambilla D, Goodenow MM, Nokta M, Rasheed S, Siwak EB, Bremer JW, Huang DD, Yi Y, Reichelderfer PS, Collman RG, “Co-receptor usage was more predictive than nsi/si phenotype for hiv replication in macrophages: is nsi/si phenotyping sufficient?,” Journal of Leukocyte Biology, vol. 68, pp. 324–330, 2000. 94, 107, 115, 186 [198] Fauci AS, “Multifactorial nature of human immunodeficiency virus disease: Implication for therapy,” Science, vol. 262, pp. 1011–1018, 1993. 98 [199] Chu C, Selwyn PA, “Diagnosis and initial management of acute hiv infection,” Am Fam Physician, vol. 81(10), pp. 1239–1244, 2010. 98 [200] Oxenius A, Price DA, Easterbrook PJ, O’Callaghan CA, Kelleher AD, Whelan JA, Sontag G, Sewell AK, Phillips RE, “Early highly active antiretroviral therapy for acute hiv-1 infection preserves immune function of cd8+ and cd4+ t lymphocytes,” PNAS, vol. 97(7), pp. 3382–3387, 2000. 98 [201] Koopman JS, Jacquez JA, Welch GW, Simon CP, Foxman B, Pollock SM, Barth-Jones D, Adams AL, Lange K, “The role of early hiv infection in the spread of hiv through populations,” Acquir Immune Defic Syndr Hum Retrovirol, vol. 14(3), pp. 249–258, 1997. 98
218
REFERENCES
[202] Svicher V, D’Arrigo R, Alteri C, Andreoni M, Angarano G, Antinori A, Antonelli G, Bagnarelli P, Baldanti F, Bertoli A, Borderi M, Boeri E, Bonn I, Bruzzone B, Callegaro AP, Cammarota R, Canducci F, CeccheriniSilberstein F, Clementi M, Monforte AD, De Luca A, Di Biagio A, Di Gianbenedetto S, Di Perri G, Di Pietro M, Fabeni L, Fadda G, Galli M, Gennari W, Ghisetti V, Giacometti A, Gori A, Leoncini F, Maggiolo F, Maserati R, Mazzotta F, Micheli V, Meini G, Monno L, Mussini C, Nozza S, Paolucci S, Parisi S, Pecorari M, Pizzi D, Quirino T, Re MC, Rizzardini G, Santangelo R, Soria A, Stazi F, Sterrantino G, Turriziani O, Viscoli C, Vullo V, Lazzarin A, Perno CF; OSCAR Study Group, “Performance of genotypic tropism testing in clinical practice using the enhanced sensitivity version of trofile as reference assay: results from the oscar study group,” New Microbiol., vol. 33(3), pp. 195–206, 2010. 100, 101 [203] Krsmanovic V, Biquard JM, Sikorska-Walker M, Cosic I, Desgranges C, Trabaud MA, Whitfield JF, Durkin JP, Achour A, Hearn MT, “Investigations into the cross-reactivity of rabbit antibodies raised against nonhomologous pairs of synthetic peptides derived from hiv-1 gp120 proteins,” J Pept Res., vol. 52(5), pp. 410–420, 1998. 101 [204] Truneh A, Buck D, Cassatt DR, Juszczak R, Kassis S, Ryu SE, Healey D, Sweet R,Sattentau Q, “A region in domain 1 of cd4 distinct from the primary gp120 binding site is involved in hiv infection and virus-mediated fusion,” Journal of Biological Chemistry, vol. 266( 9), pp. 5942–5948, 1991. 101 [205] Sharma D, Balamurali MM, Chakraborty K, Kumaran S, Jeganathan S, Rashid U, Ingallinella P, Varadarajan R, “Protein minimization of the gp120 binding region of human cd4,” Biochemistry, vol. 44(49), pp. 16192– 16202, 2005. 101 [206] Schuitemaker H, van t Wout1 A B, Lusso P, “Clinical significance of hiv1 coreceptor usage,” Journal of Translational Medicine, vol. 9(Suppl 1), pp. S5–S21, 2010. 102
219
REFERENCES
[207] Nkengasong JN, Peeters M, Nys P, Willems B, Piot P, van der Groen G, “Infectious virus titer, replicative and syncytium-inducing capacity of human immunodeficiency virus type 1,” J Med Virol., vol. 45(1), pp. 78–81, 1995. 102, 117 [208] Desai SM, Kalyanaraman VS, Casey JM, Srinivasan A, Andersen PR, Devare SG, “Molecular cloning and primary nucleotide sequence analysis of a distinct human immunodeficiency virus isolate reveal significant divergence in its genomic sequences,” Proc. Nati. Acad. Sci., vol. 83, pp. 8380–8384, 1986. 106 [209] Kuiken C. L., Foley B., Hahn B., Marx P. A., McCutchan F., Mellors J. W., Mullins J. I., Wolinsky S., and Korber B, Human Retroviruses and AIDS 1999: A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences. Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, 1999. 106, 107, 108, 119 [210] Franois KO, Pannecouque C, Auwerx J, Lozano V, Prez-Prez MJ, Schols D, Balzarini J, “The phtalocyanine prototype derivative alcian blue: the first synthetic agent with selective anti-human immunodeficiency virus activity due to its gp120 glycan-binding potential,” Antimicrobial Agents and Chemotherapy, vol. 53(11), pp. 4852–4859, 2009. 107 [211] Heinzinger N., Baca-Regen L., Stevenson M. and Gendelman H. E, “Efficient synthesis of viral nucleic acids following monocyte infection by hiv-1,” Virology, vol. 206, pp. 731–735, 1995. 107 [212] Triques K, Bourgeois A, Vidal N, Mpoudi-Ngole E, Mulanga-Kabeya C, Nzilambi N, Torimiro N, Saman E, Delaporte E, Peeters M, “Near-fulllength genome sequencing of divergent african hiv type 1 subtype f viruses leads to the identification of a new hiv type 1 subtype designated k,” AIDS Research and Human Retroviruses, vol. 16(2), pp. 139–151, 2000. 108 [213] Nkengasong JN, Willems B, Janssens W, Cheingsong-Popov R, Heyndrickx L, Barin F, Ondoa P, Fransen K,Goudsmit J, van der Groen G, “Lack of
220
REFERENCES
correlation between v3-loop peptide enzyme immunoassay serologic subtyping and genetic sequencing,” AIDS, vol. 12, pp. 1405–1412, 1998. 108, 119 [214] Chakravarthy N, Spanias A, Iasemidis LD, Tsakalis KT, “Autoregressive modeling and feature analysis of dna sequences,” EURASIP Journal on Applied Signal Processing, vol. 2004(1), pp. 13–28, 2004. 114 [215] Shapshak P, Segal DM, Crandall KA, Fujimura RK, Zhang BT, Xin KQ, Okuda K, Petito CK, Eisdorfer C, Goodkin K, “Independent evolution of hiv type 1 in different brain regions,” Aids Research And Human Retroviruses, vol. 15(9), pp. 811– 820, 1999. 118, 187 [216] Wain LV, Bailes E, Bibollet-Ruche F, Decker JM, Keele BF, Van Heuverswyn F, Li Y, Takehisa J, Ngole EM, Shaw GM, Peeters M, Hahn BH, Sharp PM, “Adaptation of hiv-1 to its human host,” Mol. Biol. Evol., vol. 24(8), pp. 1853–1860, 2007. 119 [217] Whiten A, Horner V, de Waal FBM, “Conformity to cultural norms of tool use in chimpanzees,” Nature, vol. 437, pp. 737–740, 2005. 120 [218] Hooper DC, “Mechanisms of action of antimicrobials: Focus on fluoroquinolones,” Clinical Infectious Diseases, vol. 32(Suppl 1), p. S9S15, 2001. 125 [219] Fidock DA, Nomura T, Talley AK, Cooper RA, Dzekunov SM, Ferdig MT, Ursos LM, Sidhu AB, Naud B, Deitsch KW, Su XZ, Wootton JC, Roepe PD, Wellems TE, “Mutations in the p. falciparum digestive vacuole transmembrane protein pfcrt and evidence for their role in chloroquine resistance,” Mol Cell, vol. 6(4), pp. 861–71, 2000. 125 [220] Shafer RW, Schapiro JM, “Hiv-1 drug resistance mutations: an updated framework for the second decade of haart,” AIDS Rev., vol. 10(2), pp. 67– 84, 2008. 125, 128 [221] Rhee S, Gonzales MJ, Kantor R, Betts BJ, Ravela J, Shafer RW, “Human immunodeficiency virus reverse transcriptase and protease sequence
221
REFERENCES
database,” Nucleic Acids Research, vol. 31(1), pp. 298–303, 2003. 125, 126, 128, 129, 137, 188 [222] Menendez-Arias L, “Molecular basis of human immunodeficiency virus drug resistance: an update,” Antiviral Res., vol. 85(1), pp. 210–231, 2010. 126, 128, 129, 130 [223] Rimsky LT, Shugars DC, Matthews TJ, “Determinants of human immunodeficiency virus type 1 resistance to gp41-derived inhibitory peptides,” J Virol., vol. 72(2), pp. 986–93, 1998. 126 [224] Huang H, Chopra R, Verdine GL, Harrison SC, “Structure of a covalently trapped catalytic complex of hiv-1 reverse transcriptase: implications for drug resistance,” Science, vol. 282, pp. 1669–1675, 1998. 127, 128 [225] Lazar GA, Marshall SA, Plecs JJ, Mayo SL, Desjarlais JR, “Lamivudine (3tc) resistance in hiv-1 reverse transcriptase involves steric hindrance with -branched amino acids,” Curr Opin Struct Biol., vol. 13(4), pp. 513–518, 2003. 128 [226] Sarafianos SG, Das K, Clark AD Jr, Ding J, Boyer PL, Hughes SH, Arnold E, “Lamivudine (3tc) resistance in hiv-1 reverse transcriptase involves steric hindrance with beta-branched amino acids,” Proc Natl Acad Sci U S A, vol. 96(18), pp. 10027–10032, 1999. 128 [227] Adamson CS, Ablan SD, Boeras I, Goila-Gaur R, Soheilian F, Nagashima K, Li F, Salzwedel K, Sakalian M, Wild CT, Freed EO, “In vitro resistance to the human immunodeficiency virus type 1 maturation inhibitor pa-457 (bevirimat),” J Virol., vol. 80(22), pp. 10957–10971, 2006. 129, 130 [228] Li F, Goila-Gaur R, Salzwedel K, Kilgore NR, Reddick M, Matallana C, Castillo A, Zoumplis D, Martin DE, Orenstein JM, Allaway GP, Freed EO, Wild CT, “Pa-457: a potent hiv inhibitor that disrupts core condensation by targeting a late step in gag processing,” PNAS, vol. 100(23), pp. 13555– 13560, 2003. 129
222
REFERENCES
[229] Margot NA, Gibbs CS, Miller MD, “Phenotypic susceptibility to bevirimat in isolates from hiv-1-infected patients without prior exposure to bevirimat,” Antimicrob Agents Chemother., vol. 54(6), pp. 2345–2353, 2010. 129, 130 [230] Adamson CS, Freed EO, “Novel approaches to inhibiting hiv-1 replication,” Antiviral Research, vol. 85, pp. 119–141, 2010. 130 [231] Lensen AH, Bolmer-Van de Vegte M, van Gemert GJ, Eling WM, Sauerwein RW, “Leukocytes in a plasmodium falciparum-infected blood meal reduce transmission of malaria to anopheles mosquitoes,” Infect Immun., vol. 65(9), pp. 3834–3837, 1997. 148 [232] Smith TG, Serghides L, Patel SN, Febbraio M, Silverstein RL, Kain KC, “Cd36-mediated nonopsonic phagocytosis of erythrocytes infected with stage i and iia gametocytes of plasmodium falciparum,” Infect Immun., vol. 71(1), pp. 393–400, 2003. 148 [233] Khusmith. S.. M. Sedegah, and S. L. Hoffman, “Complete protection against plasmodium yoelii by adoptive transfer of a cd8+ cytotoxic t cell clone recognizing sporozoite surface protein 2,” Infect Immun., vol. 62(7), pp. 2979–2983, 1994. 149, 154 [234] Hoffman R, Welton ML, Klencke B, Weinberg V, Krieg R., “The significance of pretreatment cd4 count on the outcome and treatment tolerance of hivpositive patients with anal cancer,” Int. J. Radiation Oncology Biol. Phys., vol. 44(1), pp. 127–131, 1999. 149 [235] Ying P, Shakibaei M, Patankar MS, Clavijo P, Beavis RC, Clark GF, Frevert U, “The malaria circumsporozoite protein: Interaction of the conserved regions i and ii-plus with heparin-like oligosaccharides in heparan sulfate,” Experimental Parasitology, vol. 85, pp. 168–182, 1997. 149 [236] Polley SD, Tetteh KKA, Lloyd JM, Akpogheneta OJ, Greenwood BM, Bojang KA, Conway DJ, “Plasmodium falciparum merozoite surface protein 3 is a target of allele-specific immunity and alleles are maintained by natural
223
REFERENCES
selection,” Journal of Infectious Diseases, vol. 195, pp. 279–287, 2007. 149, 154 [237] Rodriguez LE, Urquiza M, Ocampo M, Suarez J, Curtidor H, Guzman F, Vargas LE, Trivios M, Rosas M, Patarroyo ME, “Plasmodium falciparum eba-175 kda protein peptides which bind to human red blood cells,” Parasitology, vol. 120 ( Pt 3), pp. 225–235, 2000. 149 [238] Jongwutiwes S, Putaporntip C, Karnchaisri K, Seethamchai S, Hongsrimuang T, Kanbara H, “Positive selection on the plasmodium falciparum sporozoite threonineasparagine-rich protein: Analysis of isolates mainly from low endemic areas,” Gene, vol. 410, pp. 139–146, 2008. 149 [239] Silvie O, Goetz K, Matuschewski K, “A sporozoite asparagine-rich protein controls initiation of plasmodium liver stage development,” PLoS Pathog, vol. 4, p. e1000086., 2008. 149 [240] M. K. Hafalla JC, Silvie O, “Cell biology and immunology of malaria,” Immunological Reviews, vol. 240, p. 297316, 2011. 149, 155 [241] Kadota K, Ishino T, Matsuyama T, Chinzei Y, Yuda M, “Essential role of membrane-attack protein in malarial transmission to mosquito host,” PNAS, vol. 101(46), pp. 16310–16315, 2004. 149, 154 [242] Pei X, An X, Guo X, Tarnawski M, Coppel R, Mohandas N, “Structural and functional studies of interaction between plasmodium falciparum knobassociated histidine-rich protein (kahrp) and erythrocyte spectrin,” J Biol Chem., vol. 280(35), pp. 31166–31171, 2005. 149 [243] Cifuentes G, Vanegas M, Martinez NL, Pirajan C, Patarroyo ME, “Structural characteristics of immunogenic liver-stage antigens derived from p. falciparum malarial proteins,” Biochemical and Biophysical Research Communications, vol. 384, pp. 455–460, 2009. 149 [244] Y. M. Ishino T, Chinzei Y, “A plasmodium sporozoite protein with a membrane attack complex domain is required for breaching the liver sinusoidal
224
REFERENCES
cell layer prior to hepatocyte infection,” Cellular Microbiology, vol. 7(2), pp. 199–208, 2005. 149, 154, 155 [245] Kariu T, Ishino T, Yano K, Chinzei Y, Yuda M, “Celtos, a novel malarial protein that mediates transmission to mosquito and vertebrate hosts,” Mol Microbiol., vol. 59(5), pp. 1369–1379, 2006. 149, 154 [246] C. V. Kafsack BFC, “Apicomplexan perforin-like proteins,” Communicative and Integrative Biology, vol. 3(1), pp. 18–23, 2010. 149, 155 [247] Schasfoort RBM, Tudos AJ, Handbook of surface plasmon resonance. Royal Society of Chemistry, 2008. 150 [248] Singh PA, Buscaglia CA, Wang Q, Levay A, Nussenzweig DR, Walker JR, Winzeler EA, Fujii H, Fontoura BMA, Nussenzweig V, “Plasmodium circumsporozoite protein promotes the development of the liver stages of the parasite,” Cell, vol. 131, p. 492504, 2007. 150, 151, 152, 190 [249] Foley M, Tilley L, Sawyer WH, Anders RF, “The ring-infected erythrocyte surface antigen of plasmodium falciparum associates with spectrin in the erythrocyte membrane,” Mol Biochem Parasitol., vol. 46(1), pp. 137–147, 1991. 151, 159, 190 [250] Kraemer SM, Smith JD, “A family affair: var genes, pfemp1 binding, and malaria disease,” Current Opinion in Microbiology, vol. 9, pp. 1–7, 2006. 151, 152, 160, 163, 165, 190 [251] Sultan AA, “Molecular mechanisms of malaria sporozoite motility and invasion of host cells,” Internatl Microbiol, vol. 2, pp. 155–160, 1999. 151, 152 [252] Rathore D, McCutchan TF, “Role of cysteines in plasmodium falciparum circumsporozoite protein: Interactions with heparin can rejuvenate inactive protein mutants,” PNAS, vol. (97)15, pp. 8530–8535, 2000. 152 [253] Vlachou D, Lycett G, Siden-Kiamos I, Blass C, Sinden RE, Louis C, “Anopheles gambiae laminin interacts with the p25 surface protein of
225
REFERENCES
plasmodium berghei ookinetes,” Molecular and Biochemical Parasitology, vol. 112, pp. 229–237, 2001. 152 [254] Sahar T, Reddy KS, Bharadwaj M, Pandey AK, Singh S, Chitnis CE, Gaur D, “Plasmodium falciparum reticulocyte binding-like homologue protein 2 (pfrh2) is a key adhesive molecule involved in erythrocyte invasion,” PLoS ONE, vol. 6(2), p. e17102, 2011. 153 [255] Xainli J, Adams JH, King CL, “The erythrocyte binding motif of plasmodium 6i6ax duffy binding protein is highly polymorphic and functionally conserved in isolates from papua new guinea,” Molecular and Biochemical Parasitology, vol. 111, pp. 253–260, 2000. 153 [256] Ntumngia FB, McHenry AM, Barnwel JW, Cole-Tobian J, King CL, Adams JH, “Genetic variation among plasmodium vivax isolates adapted to nonhuman primates and the implication for vaccine development,” Am J Trop Med Hyg., vol. 80(2), pp. 218–227, 2009. 153, 155 [257] Bolton MJ, Garry RF, “Sequence similarity between the erythrocyte binding domain 1 of the plasmodium vivax duffy binding protein and the v3 loop of hiv-1 strain mn reveals binding residues for the duffy antigen receptor for chemokines,” Virology Journal, vol. 8, pp. 45–55, 2011. 153 [258] Lopaticki S, Maier AG, Thompson J, Wilson DW, Tham W, Triglia T, Gout A, Speed TP, Beeson JG, Healer J, Cowman AF, “Reticulocyte and erythrocyte binding-like proteins function cooperatively in invasion of human erythrocytes by malaria parasites,” Infection and Immunity, vol. 79(3), pp. 1107–1117, 2011. 153, 157, 190 [259] Gaur D, Mayer DC, Miller LH, “Parasite ligand-host receptor interactions during invasion of erythrocytes by plasmodium merozoites,” Int J Parasitol, vol. 34, pp. 1413–29, 2004. 153 [260] Iyer J, Gruner AC, Renia L, Snounou G, Preiser PR, “Invasion of host cells by malaria parasites: A tale of two protein families,” Mol Microbiol., vol. 65, pp. 231–49, 2007. 153
226
REFERENCES
[261] Spadafora C, Awandare GA, Kopydlowski KM, Czege J, Moch JK, Finberg RW, Tsokos GC, Stoute JA, “Complement receptor 1 is a sialic acid-independent erythrocyte receptor of plasmodium falciparum,” PLoS Pathog., vol. 6(6), pp. 1–13, 2010. 153 [262] Lamarque M, Besteiro S, Papoin J, Roques M, Vulliez-Le Normand B, Morlon-Guyot J, Dubremetz JF, Fauquenoy S, Tomavo S, Faber BW, Kocken CH, Thomas AW, Boulanger MJ, Bentley GA, Lebrun M, “The ron2-ama1 interaction is a critical step in moving junction-dependent invasion by apicomplexan parasites,” PLoS Pathog., vol. 7(2), p. e1001276, 2011. 153 [263] Cao J, Kaneko O, Thongkukiatkul A, Tachibana M, Otsuki H, Gao Q, Tsuboi T, Torii M, “Rhoptry neck protein ron2 forms a complex with microneme protein ama1 in plasmodium falciparum merozoites,” Parasitol Int., vol. 58(1), pp. 29–35, 2009. 153 [264] Curtidor H, Patino LC, Arevalo-Pinzon G, Patarroyo ME, Patarroyo MA, “Identification of the plasmodium falciparum rhoptry neck protein 5 (pfron5),” Gene 474:2228, vol. 474, pp. 22–28, 2011. [265] Ito D, Han E, Takeo S, Thongkukiatkul A, Otsuki H, Torii M, Tsuboi T, “Plasmodial ortholog of toxoplasma gondii rhoptry neck protein 3 is localized to the rhoptry body,” Parasitology International, vol. 60, pp. 132– 138, 2011. 154 [266] Straub KW, Cheng SJ, Sohn CS, Bradley PJ, “Novel components of the apicomplexan moving junction reveal conserved and coccidia-restricted elements,” Cell Microbiol., vol. 11(4), pp. 590–603, 2009. 153 [267] Lacroix C, Mnard R, “Trap-like protein of plasmodium sporozoites: linking gliding motility to host-cell traversal,” Trends Parasitol., vol. 24(10), pp. 431–434, 2008. 154 [268] Morahan BJ, Wang L, Coppei RL, “No trap, no invasion,” Trends in Parasitology, vol. 25(2), pp. 77–84, 2008. 154
227
REFERENCES
[269] Wang R, Charoenvit Y, Corradin G, De La Vega P, Eileen D. Franke ED, Hoffman SL, “Protection against malaria by plasmodium yoelii sporozoite surface protein 2 linear peptide induction of cd4+ t cell- and ifn-gamma- dependent elimination of infected hepatocytes,” The Journal of Immunology, vol. 157, pp. 4061–4067, 1996. 154 [270] Mota MM, Pradel G Vanderberg JP, Hafalla JCR, Frevert U,Nussenzweig RS, Nussenzweig V, Rodriguez A, “Migration of plasmodium sporozoites through cells before infection,” Science, vol. 291, pp. 141–144, 2001. 154 [271] K. S. D. T. B. L. M. K. Heiss K, Nie H, “”functional characterization of a redundant plasmodium trap family invasin, trap-like protein, by aldolase binding and a genetic complementation test,” Eukaryot Cell., vol. 7(6), pp. 1062–1070, 2008. 155 [272] Moreira CK, Templeton TJ, Lavazec C, Hayward RE, Hobbs CV, Kroeze H, Janse CJ, Waters AP, Sinnis P, Coppi A, “The plasmodium trap/mic2 family member, trap-like protein (tlp), is involved in tissue traversal by sporozoites,” Cell Microbiol., vol. 10(7), pp. 1505–1616, 2008. 155 [273] Praper T, Sonnen A, Viero G, Kladnik A, Froelich CJ, Anderluh G, Serra MD, Gilbert RJC, “Human perforin employs different avenues to damage membranes,” The Journal of Biological Chemistry, vol. 286(4), pp. 2946– 2955, 2011. 155 [274] Amino R, Giovannini D, Thiberge S, Gueirard P, Boisson B, Dubremetz JF, Prevost MC, Ishino T, Yuda M, Menard R, “Host cell traversal is important for progression of the malaria parasite through the dermis to the liver,” Cell Host Microbe, vol. 3, pp. 88–96, 2008. 155 [275] Chitnis CE, Miller LH, “Identification of the erythrocyte binding domains of plasmodium vivax and plasmodium knowlesi proteins involved in erythrocyte invasion,” J. Exp. Med., vol. 180, pp. 497–506, 1994. 155 [276] Perlaza B, Sauzet J, Brahimi K, BenMohamed L, Druilhe P, “Interferong, a valuable surrogate marker of plasmodium falciparum pre-erythrocytic stages protective immunity,” Malaria Journal, vol. 10, pp. 27–35, 2011. 156
228
REFERENCES
[277] Richard D, MacRaild CA, Riglar DT, Chan JA, Foley M, Baum J, Ralph SA, Norton RS, Cowman AF, “Interaction between plasmodium falciparum apical membrane antigen 1 and the rhoptry neck protein complex defines a key step in the erythrocyte invasion process of malaria parasites,” J Biol Chem., vol. 285(19), pp. 14815–14822, 2010. 157, 190 [278] Coley AM, Gupta A, Murphy VJ, Bai T, Kim H, Foley M, Anders RF, Batchelor AH, “Structure of the malaria antigen ama1 in complex with a growth-inhibitory antibody,” PLoS Pathog., vol. 3(9), pp. 1308–1319, 2007. 157, 190 [279] Asch AS, Silbiger S, Heimer E, Nachman RL, “Thrombospondin sequence motif (csvtcg) is responsible for cd36 binding,” Biochem Biophys Res Commun., vol. 182(3), pp. 1208–1217, 1992. 158, 190 [280] Da Silva E, Foley M, Dluzewski AR, Murray LJ, Anders RF, Tilley L, “The plasmodium falciparum protein resa interacts with the erythrocyte cytoskeleton and modifies erythrocyte thermal stability,” Mol Biochem Parasitol., vol. 66(1), pp. 59–69, 1994. 159, 190 [281] Myszka DG, Sweet RW, Hensley P, Brigham-Burke M, Kwong PD, Hendrickson WA, Wyatt R, Sodroski J, Doyle ML, “Energetics of the hiv gp120cd4 binding reaction,” PNAS, vol. 97(16), pp. 9026–9031, 2000. 160, 166 [282] Mo M, Lee HC, Kotaka M, Niang M, Gao X, Iyer JK, Lescar J, Preiser P, “The c-terminal segment of the cysteine-rich interdomain of plasmodium falciparum erythrocyte membrane protein 1 determines cd36 binding and elicits antibodies that inhibit adhesion of parasite-infected erythrocytes,” Infect Immun., vol. 76(5), pp. 1837–1847, 2008. 165 [283] Shu W, Ji H, Lu M, “Interactions between hiv-1 gp41 core and detergents and their implications for membrane fusion,” J Biol Chem., vol. 275(3), pp. 1839–1845, 2000. 171, 172, 176, 179 [284] Myers G, Korber B, Wain-Hobson S, Jeang K-T, Henderson LE, Pavlakis GN, Human Retroviruses and AIDS 1994: A Compilation and Analysis
229
REFERENCES
of Nucleic Acid and Amino Acid Sequences. Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, 1994. 187 [285] Betel D, Breitkreuz KE, Isserlin R, Dewar-Darch D, Tyers M, Hogue CWV, “Structure-templated predictions of novel protein interactions from sequence information,” Plos Computational Biology, vol. 3(9), pp. 1783–1789, 2007. 193 [286] Skrabanek L, Saini HK, Bader GD, Enright AJ, “Computational prediction of protein-protein interactions,” Mol Biotechnol., vol. 38(1), pp. 1–17, 2008. 193
230
Appendix A List of published papers 1. Nwankwo N, Seker H. 2010. ”Assessment of the Binding Characteristics of Human Immunodeficiency Virus Type 1 Glycoprotein120 and Host Cluster of Differentiation4 Using Digital Signal Processing ”. Conf. Proc. IEEE BIBE 2010:289-290. 2. Nwankwo N, Seker H. 2010. ”A signal processing-based bioinformatics approach to assessing drug resistance: human immunodeficiency virus as a case study”. Conf. Proc. IEEE Eng Med Biol Soc. 2010:1836-1839. 3. Nwankwo N, Seker H. 2011. Digital Signal Processing Techniques: Calculating the Biological Functionalities of Proteins J. Proteomics Bioinform 4: 260-268. doi:10.4172/jpb.1000199. 4. Nwankwo N, Seker H. 2011. ”Preliminary Investigations into the Binding Interactions between Plasmodial and Host Proteins Using Computational Approaches” J Proteomics Bioinform 4: 269-277. doi:10.4172/jpb.1000200. 5. Nwankwo N, Seker H. 2012. ”HIV Progression to AIDS: A Bioinformatic Approach to Determining the Mechanism of Action”. Current HIV Research Journal. List of papers under preparation 1. Nwankwo N, Seker H. 2012. ”Bioinformatics-based Approach to Characterization and Identification of HIV Tropic and Phenotypic Associations” 2. Nwankwo N, Seker H. 2012 ”Continuous Wavelet Transform-Based Study of the Connecting Peptides: HIV gp41 and 1DF5 as a Case Study”.
231