Critical Assessment of the Methods and the Features

Critical Assessment of the Methods and the Features Used for Hot Spot Residue Prediction at Protein-Protein Interfaces Selin Karagulle, Ozlem Keskin, Attila Gursoy Center for Computational Biology and Bioinformatics (CCBB) Koç University, Istanbul, Turkey {skaragulle, okeskin, agursoy}@ku.edu.tr

BACKGROUND

METHODOLOGY

Proteins interact through interfaces. A few residues at the interface contribute significantly to binding free energies, these residues are called hot spots.

Datasets Interface residues of complexes with no 3D structure , interface residues of single chain and interface residues of DNA complexes are eliminated.

Interface residues whose mutation leads to change in binding free energy greater than or equal to 2.0 kcal/mol the ones whose interaction strengths are labeled as ‘strong’ considered as hotspots.

There are totally 1206 residues of 66 complexes collected from 12 different sources.

Alanine scanning for experimental determination. High cost Not feasible at large scale

Training set The interface residues whose observed binding free energies are ≥2.0 kcal/mol are considered as hot spots. Testing set Hot spot residues are labeled as the ones with ‘strong’ interaction strengths and others are tagged as non-hot spots. Training set vs. Testing set Train

OBJECTIVES In recent studies, lots of features for hot spot prediction were used.These features were tested on different data sets by implementing various machine learning methods. Comparison of these studies for the accuracy of hot spot prediction is difficult. We aim to provide single data and feature set for hot spot prediction using machine learning methods, to generate nonredundant data set and to design gold standard database, HOTBASE

Features

Test

148

268

134

656

H NH

Accesible Surface Area Pair Potentials Knowledge-based solvent mediated inter-residue potentials extracted from protein interfaces, are used in this work. Evolutionary Conservation Score Residue conservations are found by Rate4Site (R4S) algorithm and are obtained from ConSurf Server Database.

RESULTS We implemented random forest model using six classes based on their dipoles and volumes of the side chains, secondary structure, atom contacts and atom contact areas, residue contacts, physicochemical features, ASA and depth index on our nonredundat data set. There are 57 features for every residue as in work of Wang et al. [2].

BeatMusic BeatMusic[1] evaluates the change in binding affinity between proteins (or protein chains) caused by single-site mutations in their sequence. The predictions are based on the structure of the protein-protein complex. TP:

TN:

FP:

# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

FN:

0.771 0.520

6% 17% 6%

0.358

0.273

71%

Accuracy

Recall

Precision

F-measure

Features mass_target atom_contact_areas_target polarity_target isopoint_target residue_contacts_target polarizability_target rel_dpx_target ave_dpx_target isopoint_mirror rel_ASA_target rel_s_ch_dpx_target atom_contacts_target rel_s_ch_ASA_target rel_ASA_intra polarity_mirror hydrophilicity_target s_ch_ave_dpx_target mass_intra sd_s_ch_dpx_target class_mirror hydrophobicity_intra rel_s_ch_ASA_mirror hydrophobicity_target

MeanDecreaseAccuracy 6.512 6.136 5.764 5.666 5.246 5.017 4.949 4.683 4.674 4.523 4.522 4.297 4.262 4.126 3.975 3.945 3.916 3.777 3.638 3.209 3.077 3.039 3.037

Training set

0.896

Testing set 0.632

0.477

Recall

Precision

Relative ASA in complex state and Pair potentials

F-measure

TP

TN

8%

FP

FN

TP

TN

FP

FN

0.530 17%

0.555

0.514

18%

Physicochemical Features The physicochemical characteristics of an amino acid are hydrophobicity, hydrophilicity, polarity, polarizability, propensities, isoelectric point, mass, and average accessible surface area.

Support Vector Machines Support Vector Machines (SVMs) are a class osupervised learning algorithms, and can learn a linear decision boundary to discriminate different classes with maximum margin.

0.680

9%

Residue Contacts Two residues will have residue contact information if there is one pair of contact atoms from them individually.

Machine Learning Algorithms

Testing Set

0.743

Atom Contacts and Atom Contact Areas The contact between two atoms (atom_contact) is defined by the CSU program

Depth Index The depth of an atom refers to the distance from its closest solvent accessible atom.

We used constructed model that if relative ASA in complex state of a residue = 18.0 hotspot else “NonHotspot” [3] Testing Set

Category of residues and secondary structure Residues have six classes based on their dipoles and volumes of the side chains. There are three types of secondary structure: helix, strand and loop.

0.262

0.166 Accuracy

0.609

MeanDecreaseAccuracy: The average decrease of classification accuracy when the values of a particular feature are randomly permuted on the out-of-bag samples.

Training Set

Sequence Profile Sequence profile is obtained by PSI-BLAST searching against NCBI non-redundant database. The BLOSUM62 substitution matrix and E-value threshold of 0.001 are chosen as parameters.

0.840

0.668

Training set

Sequence Entropy Sequence entropy value for each residue is obtained from HSSP database

0.533

0.411 0.335

18%

Random Forest Model RF is an ensemble classification algorithm that employs a collection of decision trees to reduce the output variance of individual trees and thus improves the stability and accuracy of classification

15%

65%

50%

Accuracy

Recall

Precision

F-measure

Accuracy

Recall

Precision

F-measure

CONCLUSION

Results of all features

We implemented support vector machines algorithm using hydrophobicity, hydrophilicity, polarity, polarizability, propensities, average accessible surface area, sequence Profile, evaluationary conservation score, sequence entropy on the dataset of Chen et al.[4]

We implemented support vector machines algorithm using all features on our nonredundant dataset.

Ab+ 10-fold cross validation

0.573

0.604

0.575

Ab+ Evaluation on test set

0.6

Ab- 10-fold cross validation

0.593

Ab- Evaluation on test set

0.560

0.424

0.523 0.537

0.514

0.576

0.519

0.45

0.402

0.309

0.235

0.152

Precision Precision

Recall

Accuracy

Once database is completed, it will be published at http://prism.ccbb.ku.edu.tr/hotbase/

0.674 0.488

0.481 0.483

Evaluation on test set 0.848

0.617 0.569

Our database includes values of many features that are used for machine learning methods and give highly accurate results.

SVM Results of All features 10-fold cross validation

F-Measure

We have nonredundant dataset.

Recall

Accuracy

F-measure

REFERENCES 1. Dehouck Y, Kwasigroch JM, Rooman M, Gilis D. BeAtMuSiC: Prediction of changes in protein-protein binding affinity upon mutations. Nucleic Acids Research (2013).doi: 10.1093/nar/gkt450 2. Wang L, Liu Z-P, Zhang X-S, Chen L: Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Eng. Des. Sel. 2012, 25:119– 126. 3. Tuncbag N, Keskin O, Gursoy A: HotPoint: hot spot prediction server for protein interfaces. Nucleic Acids Research 2010, 38:W402–W406. 4. Chen R, Chen W, Yang S, Wu D, Wang Y, Tian Y, Shi Y: Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics 2011, 12:311.

Thanks to Scientific and Technological Research Council of Turkey (TUBITAK) for their funding.

Critical Assessment of the Methods and the Features

Critical Assessment of the Methods and the Features

Suggest Documents

Critical Appraisal of Methods for the Assessment of ... - Springer Link

A critical assessment of the methods for intercalating ... - CiteSeerX

Critical Assessment of Option Pricing Methods

Overview and Critical Assessment of the Tensile

Critical features of peer assessment of clinical performance to ...

Integrating Grid and Web Services: A Critical Assessment of Methods

A critical analysis of three quantitative methods of assessment of ...

Identifying the Critical Features That Affect the Job Performance of ...

The use of the critical path and critical chain methods in the South ...

special ultrasonic methods for the assessment and

The critical assessment of vapour pressure

The Quantitative Assessment of Imaging Features for the Study of ...

Supplemental Materials for âCritical Assessment of Methods to ... - MDPI

A Critical Assessment of Stellar Mass Measurement Methods

Critical Assessment of Option Pricing Methods Using Artificial Neural ...

A Critical Assessment of Photometric Redshift Methods: A CANDELS ...

Critical Assessment of Option Pricing Methods Using Artificial Neural ...

Critical Assessment of High-throughput Standalone Methods for ...

A Critical Assessment of Feature Selection Methods ...

A Critical Assessment of the Cultural and Institutional Roles of ...

a critical assessment of the presence of barbastella barbastellus and ...

ASSESSMENT OF THE METHODS PRESENTED BY ...

The Applicability of Epidemiological Methods to the Assessment of the ...

Assessment of sample preparation methods for the

Critical Assessment of the Methods and the Features