A Protein Block Based Fold Recognition Method for ... - IngentaConnect

0 downloads 0 Views 267KB Size Report
twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, ... quence is an important area in protein structure and function.
Send Orders of Reprints at [email protected] Protein & Peptide Letters, 2013, 20, 249-254

249

A Protein Block Based Fold Recognition Method for the Annotation of Twilight Zone Sequences V. Suresh, K. Ganesan and S. Parthasarathy* Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024, Tamil Nadu, India Abstract: The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value  2.5 and Pvalue  0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.

Keywords: Local protein structure, pairwise local alignment, protein block, protein folds recognition, secondary structure, Structural alphabet, twilight zone sequences. INTRODUCTION Identification of the fold of a new protein from its sequence is an important area in protein structure and function prediction. There are many computational methods employed to predict the three dimensional (3D) structure from the sequence. Basically, prediction of the 3D structure of proteins can be classified into two major classes: (i) sequence-sequence alignment (ii) sequence-profile alignment [1]. Some of the widely used techniques in the sequencesequence alignments are Needleman-Wunsch [2] and SmithWaterman [3] dynamic programming algorithms, BLAST [4] and FASTA [5]. But these simple sequence-sequence alignment methods are applicable when the identity of the two sequences is very high. It does not work for the proteins with low sequence identity lies between 15-25%, which are known as Twilight Zone (TZ) sequences [6]. However, these TZ sequences may share similar 3D structures. In such cases, the detection of the correct fold with simple sequence alignment methods is not a correct choice. To find the correct fold of such remote homologs, sequence-profile alignment methods like PSI-BLAST [7] and IMPALA [8] are used. But the performance of these sequence-profile alignment methods needs some improvement to predict the fold for TZ sequences.

*Address correspondence to this author at the Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024, Tamil Nadu, India; Tel: +91 431 2407071 Extn: 655 (Off); Mob: +91 94435 33095; Fax: +91 431 2407045; E-mail: [email protected] 1875-5305/13 $58.00+.00

When the identity of the two sequences is very low and lies in the TZ (15-25%) protein fold recognition [9, 10] is a choice to identify the correct fold. Here, the recognition of the fold is done by comparing the query sequence with known 3D structures. Generally, protein fold recognition methods are grouped into three main categories: (1) structure seeded profile; (2) profile-profile alignment and (3) machine learning techniques [1]. Some of the well-developed methods in the above three classes are FFAS03 [11], 3D-PSSM [12], FUGUE [13], mGenThreader [14], ORFeus [15], MUSTER [16], SP5 [17] and PSS-3D1D [18]. Generally, the sequence-profile and other fold recognition techniques are mainly based on the similarities between the features of amino acid compositions and secondary structure with the known 3D folds [19]. Here, the problem is that most of the secondary structure based fold recognition methods fail to cover the coil regions, which are comprise 50% of the protein structures. To overcome this problem, in the last ten years, the 3D structure of a protein is viewed as a group of small fragments called Structural alphabet (SA) [20]. Many researchers have defined the SA with limited number of small fragments and lengths. Structural alphabets proved their efficiency in the approximation of each and every part of the backbone of the protein including the coil. One of the Structural alphabets called Protein Blocks (PB) with 16 small fragments was defined by de Brevern [21]. Each fragment has five residues in length and is denoted by the letters from a to p. It is used to represent the 3D structures of the proteins into a string of one dimensional (1D) sequence of PB. The methods for assigning PB for a known protein structure and predicting PB from © 2013 Bentham Science Publishers

250

Protein & Peptide Letters, 2013, Vol. 20, No. 3

amino acid sequences were developed and used for many applications [21-25]. Some of the important achievements with the PB are; to predict long fragments [23, 24, 26-30] and short loops [3133], to do protein contact analysis [34], to build globular [35] and transmembrane protein structures [36-38], to define binding site signature [39], and to define a reduced amino acid alphabet for mutation design [40]. Among the above applications, the PB based substitution matrix [41] is used for structure based PB sequence alignment [42] and large scale protein structure comparison [43]. The benchmark results have proved that PB based approach is the most significant method for the mining of PDB and identifying the proteins of a similar 3D structure [43]. Compared to the other existing structural alphabets, PB has the best predictive ability [44]. Moreover, PB is a generalized concept of the secondary structures. It can clearly describe the backbone conformations like turns, C, N cap of the helices and sheets and this information can be used to improve protein fold recognition. In this study, we apply the PB for protein fold recognition and the same has been developed as a web server called PredictFold-PB, for the annotation of TZ sequences. To our knowledge, there is no other PB based fold recognition method. Our method was tested with a TZ dataset of 15-25% sequence identity to our fold library. The results indicates that our PredictFold-PB can recognize the possible fold for nearly 35.5% of the TZ sequences with their predicted PB sequence obtained by pb_prediction server [45] which is available at Protein Block Export server [42]. MATERIALS AND METHODS Data set Our PredictFold-PB server uses a data set of 953 folds out of 1195 folds in the SCOP (1.75 release) database [46]. For each fold, a representative 3D structure was extracted from the PDB [47] data using the criteria (i) in the case of Xray structures data with the resolution of  3.0Å were used and (ii) in the case of NMR structures, the server OLDERADO [48] was used to select the best model. Finally, the dataset (953 folds) of our fold library was constructed with both X- ray (848 proteins) and NMR (105 NMR models) structures. The remaining 242 SCOP folds were not considered due to the presence of non-standard amino acids and/or missing amino acids at their ATOM records. Protein Block Assignment Protein Blocks [21] has a set of 16 small fragments, denoted by the letters from a to p. Each fragment has five residues in length and they were classified based on dihedral angles of  and . Assigning of PB for a given protein structure involves the following steps; (i) cleave the 3D structure of backbone into overlapping fragments of five residue in length, (ii) find the eight dihedral angles (, ) for each fragment, (iii) calculate root mean square deviations on angular values (RMSDa) between observed (, ) values in the fragment and ideal (, ) values [21] of each one of 16 PB, (iv) find the lowest RMSDa value among 16 RMSDa and (v) determine the corresponding PB to the center residue of the fragment. The overlapping fragments of whole protein back-

Suresh et al.

bone were encoded until the protein sequence length n is reached to n - 4 fragments. Assigning of PB for the first and last two residues of the given protein structure is not possible. In such case, to match the length of the PB sequence to their corresponding amino acid sequence, we have added two ‘z’ letters as the prefix and suffix of the PB sequence. Similar method was followed to all the 953 folds in the dataset and the PB data were stored in PB fold library. Protein Block Prediction Like the description or assignment of PB sequence from the 3D structure of protein, it is also possible to predict the PB sequence from its amino acid sequence. There are many PB prediction methods [25] developed with different statistical approaches using sequences as well as evolutionary information. The knowledge based pb_prediction server gives 62.0% [25] prediction accuracy which is the best among the online PB prediction servers. So, in this study, we have predicted the PB sequence by using the Offmann knowledge based method through pb_prediction server [45], which is available at Protein Block Expert (PBE) [42]. Substitution Matrix and Gap Penalties A PB based substitution matrix was developed by Tyagi et al [42] and was extensively used for the mining of protein structures [43] and large scale protein structure comparison [42]. It not only reveals the efficiency of the substitution matrix but also found that, -5 is an optimal open gap penalty for the PB sequence comparison. In our method, we used the Offmann revised PB substitution matrix [20] for the pairwise sequence alignment of PB sequence. Also we have checked the different open gap penalties (-2, -3, -4, -5) with the extension gap penalty of -1 (data not shown) and fixed -5 and -1 as the optimal open and extension gap penalty values for the PB sequence alignment. Protein Fold Recognition - PredictFold-PB The steps involved in the PredictFold-PB fold recognition method are: (i) pairwise local alignment between predicted PB sequence and all 953 PB sequences in the PB fold library (ii) calculating the statistical significance scores of zvalue [49] and P-value [50] using

μ =  ln n +  ,

(1)

z = (S  μ ) /  ,

(2)

P = 1  exp( exp(z)) ,

(3)

where, μ represents the mean of all alignment score,  and  represent the linear regression coefficient and n is the length of the given sequence. Similarly, S is the alignment raw score,  is the standard deviation of 953 raw scores and z and P are the statistical scores. (iii) sorting the alignment results based on z and P values and finding the true hit, where, the result with the z-value  2.5 and the P-value  0.08 We fixed the z and P cut-off values at 2.5 and 0.08, respectively, through the different benchmarking experiments with all 953 library folds and TZ datasets. The results are discussed in results section.

A Protein Block Based Fold Recognition Method

Protein & Peptide Letters, 2013, Vol. 20, No. 3

Assessment of Fold Prediction Ability Accuracy of our PredictFold-PB fold recognition method was measured by the Receiver Operating Characteristic (ROC) curve [51]. Here, we plot the ROC for true positive rate (TPR) or sensitivity against the false positive rate (FPR) at different z score cut-off levels. In the ROC graph, more area under the curve represents the best accuracy of the test. The TPR and FPR were calculated using

TPR = TP / (TP + FN ) ,

(4)

FPR = FP / (FP + TN ) ,

(5)

where, TP and FP denote the number of true positives and false positives, respectively. Similarly, TN and FN denote the number of true negatives and false negatives, respectively. The area under the ROC curve was measured by using trapezoidal method. RESULTS AND DISCUSSION Performance of the PredictFold-PB method was measured by two different benchmarking experiments. The first benchmark was performed with self dataset of all the 953 sequences in the PB fold library and the second benchmark was performed with a TZ dataset, which is having 15 to 25% sequence identity to the PB fold library. Preparation of Self Dataset Amino acid sequences of all 953 library folds were extracted from their PDB ATOM records and then the prediction of PB sequence was done by using pb_prediction server, which is available at PBE [42]. Out of 953 folds, the server correctly predicted the PB sequences for only 712 folds and the remaining 241 folds had some missing PB letters and hence it was required to insert ‘Z’ in order to match the length of the predicted PB sequence with its corresponding amino acid sequence. So, we performed the self benchmark experiment for two different subsets (i) Self_953_dataset_I with mispredicted 953 PB sequences and (ii) Self_712_set_II with only correctly predicted 712 PB sequences. Preparation of TZ Dataset The TZ dataset was prepared from the SCOP 25 dataset and EMBOSS needle program using the percentage of sequence identity lies between 15% and 25%. Totally 396 proteins were obtained and then the prediction of PB sequences was done by pb_predcition server. Out of 396 proteins, PB sequence was correctly obtained only for 243 proteins and the remaining 153 proteins had some missing PB in their PB sequences. So, out of 396 proteins, we have performed the TZ benchmark for TZ_243_dataset_III of only 243 TZ proteins belonging to all the seven classes of SCOP. Recognition of Correct SCOP fold We assessed how well the PredictFold-PB method can recognize the correct SCOP fold for the given amino acid sequence. The datasets of (i) Self_953_dataset_I (ii) Self_712_dataset_II and (iii) TZ_243_dataset_III were indi-

251

vidually performed. Each time, one predicted PB sequence of the dataset was selected and queried against the all 953 assigned PB sequences in the PB fold library. In each alignment, the statistical significance scores of z and P were calculated from the raw score of the alignment and applied to recognize the correct SCOP fold within top ranking alignments. In this study, the appearances of the correct SCOP fold within the top five ranking aliments were considered as the true hit. In the analysis of the true hit results with all seven classes of SCOP, the Self_953_dataset_I shows that 69.0% of folds were predicted at top and 73.1% of the folds were predicted at top five ranking alignments. Similarly, the true hit results of Self_712_dataset_II shows that 84.5% and 89.4% of the folds were predicted at top and top five ranking alignments, respectively. In the self datasets, as a special case, the proteins belongs to multi-domain, membrane and small protein were restricted and the true hit results were analyzed with only four major classes of the SCOP which results in improvement in the prediction. In this special case, the true hit result of Self_953_dataset_I shows that 71.3% of the folds were predicted at top and 75.3% were predicted at top five ranking alignments. Similarly, the analysis of the true hit for Self_712_dataset_II shows that 85.1% and 89.4% of the folds were slightly improved the prediction at top and top five ranking alignments, respectively. Investigation of true hit result for TZ_243_dataset_III shows that our method can predict upto 31.2% of folds at top and upto 39.0% of the folds top five ranking alignments with all seven classes of SCOP. When the TZ dataset is restricted to four major classes of SCOP the prediction of our method was slightly improved to 35.5% and 42.5% at top and top five ranking alignments, respectively. The true hit results for four major SCOP classes were given the Table 1 for both Self_712_dataset_II and TZ_243_dataset_III datasets. Further, we divided the TZ_243_dataset_III into two subsets based on their level of sequence identity to our PB fold library (i) TZ_20-25 dataset covered 102 proteins with the sequence identity cutoff of 20-25% and (ii) TZ_15-19 dataset covered 141 proteins with the sequence identity cutoff of 15-19%. Each subset was separately analyzed for recognition of correct SCOP fold. Subsets of TZ_243_dataset_III (i) TZ-15-19 and (ii) TZ20-25 were individually benchmarked and the results of TZ_20-25 dataset show that 40.2% prediction and 48.0% prediction at top and top five ranking alignments, respectively. True hit results of TZ_15-19 dataset shows only 24.8% and 32.6% at top and top five ranking alignments, respectively. The above results clearly illustrate that the pairwise PB sequence alignment using the PB substitution matrix can be used to recognize the similar folds even though the structural information is unknown. The statistical scores of z-value  2.5 and P-value  0.08 are optimum values to extract the proteins sharing similar structural features. Moreover, considering the results of different twilight zone dataset, our method has well behaved for sequences with the identity cutoff of 20-25% to our PB fold library.

252

Protein & Peptide Letters, 2013, Vol. 20, No. 3

Table 1.

Suresh et al.

Benchmark Results of Self and TZ Datasets Self_712_dataset_II

TZ_243_dataset_III

SCOP Classes

Total

Top 5 %

Top %

Total

Top 5 %

Top %

a. All 

166

132 (79.5)

126 (75.5)

54

19 (35.2)

17 (31.5)

b. All 

116

111 (95.7)

104 (89.7)

51

21 (41.2)

15 (29.4)

c.  by 

102

99 (91.1)

93 (91.2)

22

12 (54.5)

10 (45.5)

d.  plus 

221

199 (90.0)

192 (86.9)

73

33 (45.2)

29 (39.7)

Total

605

541 (89.4)

515 (85.1)

200

85 (42.5)

71 (35.5)

Values in the parenthesis indicate percentage over total folds tested

Performance Measurement with ROC Curve Overall accuracy of the PredictFold_PB method is measured by the ROC curve. First we calculated the true positive rate (TPR) and false positive rate (FPR) for TZ_243_dataset_III at different z-value cutoff level and then the area under the ROC curve was measured. Here, the area under the ROC curve of TZ_243_dataset_III was measured as 0.95. Fig. (1) represents the ROC curve of TZ_243_dataset_III.

1.2

True Positive Rate (TPR )

1.0

0.8

0.6

either by uploading a file or by directly copying them into the form. Before submit the form user must check whether the lengths of both the sequences are equal. If they are not equal, user must insert the letter ‘Z’ at the missing positions of the PB sequence. But in such cases, the accuracy of this server for the fold prediction will be low. Once all the inputs have been specified, the user must press the ‘predict fold’ button. Then the PredictFold-PB server performs a pairwise local alignment for given query PB sequence with all 953 PB sequences in the PB fold library. The possible folds are predicted from the alignments for which the z-value  2.5 and P-value  0.08 and are marked ‘Predicted Fold’ in the Remarks column. The ‘Results’ page contains the predicted folds, which are represented by SCOP fold index and their alignment length, raw score, z-value and P-value. Each predicted fold is provided with a SCOP link to access the structural classification of the folds. Further, PB based amino acid sequence alignments are also displayed. The ‘Results’ page of our server for the sample sequence provided in the supplementary data is shown in Fig. (2).

0.4

Our PredictFold-PB server is more sensitive for globular proteins and less sensitive for membrane, small and multidomain proteins. In future, the PredictFold-PB server will be updated, if new release on SCOP folds has been noticed.

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

1.2

False Positive Rate (FPR)

Figure 1. ROC curve showing the performance of ‘PredictFoldPB’ fold recognition method for the TZ_243_dataset_III and the area under the curve is 0.95 sq. units.

Description of PredictFold-PB server We have developed a web server called PredictFold-PB for our method. Homepage of PredcitFold-PB sever can be accessed through http://bioinfo.bdu.ac.in/~psa/. In order to predict the possible fold for a query protein, user must submit both amino acid sequence and its predicted PB sequences in FASTA format. Prediction of PB is possible with pb_prediction server (http://bioinformatics.univ-reunion.fr/ PBE/pb_prediction/) available at PBE. In our server homepage, we made a direct link to this server and the user is requested to predict the PB for their query protein sequence. The PB along with the protein sequence can be submitted

IMPLEMENTATION PB fold library has been created using Perl. The PredictFold-PB program was written in C. PHP forms are used to obtain the user query sequence and a web interface developed with APACHE and PHP is used to produce the ‘Results’ page. CONCLUSIONS In this study, we showed the application of protein blocks in the recognition of correct SCOP folds for the given twilight zone sequences. Our results clearly indicate that the improvement in the PB sequence prediction can enhance our method to recognize the correct fold. Moreover, our method is purely based on protein block alignment method. To our knowledge, there is no other protein block based method to recognize the fold of TZ sequences. Our next work will be focused on to improve our fold recognition method by adding the structural features of environmental coordinates in

A Protein Block Based Fold Recognition Method

Protein & Peptide Letters, 2013, Vol. 20, No. 3

253

Figure 2. ‘Results’ page of PredictFold-PB server showing predicted folds and their alignment.

terms of 3D1D and secondary structures information. Similarly, work on identification of SCOP super family in terms of PB is in progress.

PDB

=

Protein Data Bank

ROC

=

Receiver Operating Characteristic Curve

RMSDa =

Root Mean Square Deviation on Angular value

CONFLICT OF INTEREST

SA

=

Structural Alphabet

The authors confirm that this article content has no conflicts of interest.

SCOP

=

Structural Classification of Proteins

TPR

=

True Positive Rate

ACKNOWLEDGEMENTS

TZ

=

Twilight Zone

This work forms part of a research project funded by Department of Information Technology (DIT), Government of India, New Delhi. Two of the authors (VS & KG) thank DIT for Research Fellowship.

REFERENCES [1] [2]

ABBREVIATIONS 1D

=

One dimensional

3D

=

Three dimensional

FPR

=

False Positive Rate

NMR

=

Nuclear Magnetic Resonance

PB

=

Protein Block

PBE

=

Protein Block Export

[3] [4]

[5] [6] [7]

Yan, R.J.; Si, J.; Wang, C.; Zhang, Z. DescFold: A web server for protein fold recognition. BMC Bioinformatics, 2009, 10, 416. Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 1970, 48(3), 443-453. Smith, T.F.; Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol., 1981, 147(1), 195-197. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol., 1990, 215(3), 403410. Pearson, W.R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol., 1990, 183, 63-98. Doolittle, R.F. Of urfs and orfs: A primer on how to analyze derived amino acid sequences.; University Science Books: Mill Valley, CA, 1986. Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new

254

[8]

[9]

[10] [11] [12]

[13]

[14] [15]

[16]

[17] [18]

[19]

[20] [21] [22] [23]

[24]

[25]

[26] [27]

[28]

Protein & Peptide Letters, 2013, Vol. 20, No. 3

Suresh et al.

generation of protein database search programs. Nucleic Acids Res., 1997, 25(17), 3389-3402. Schaffer, A.A.; Wolf, Y.I.; Ponting, C.P.; Koonin, E.V.; Aravind, L.; Altschul, S.F. IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics, 1999, 15(12), 1000-1011. Bowie, J.U.; Luthy, R.; Eisenberg, D. A method to identify protein sequences that fold into a known three-dimensional structure. Science, 1991, 253, 164-169. Jones, D.T.; Taylor, W.R.; Thornton, J.M. A new approach to protein fold recognition. Nature, 1992, 358(6381), 86-9. Jaroszewski, L.; Rychlewski, L.; Li, Z.; Li, W.; Godzik, A. FFAS03: a server for profile-profile sequence alignments. Nucleic. Acids Res., 2005, 33(suppl 2), W284-W288. Kelley, L.A.; MacCallum, R.M.; Sternberg, M.J. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol., 2000, 299(2), 499-520. Shi, J.; Blundell, T.L.; Mizuguchi, K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol., 2001, 310(1), 243-257. McGuffin, L.J.; Jones, D.T. Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics, 2003, 19(7), 874-881. Ginalski, K.; Pas, J.; Wyrwicz, L.S.; von Grotthuss, M.; Bujnicki, J.M.; Rychlewski, L. ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic. Acids Res., 2004, 31(13), 3804-3807. Wu, S.; Zhang, Y. MUSTER: Improving protein sequence profileprofile alignments by using multiple sources of structure information. Proteins, 2008, 72(2), 547-556. Zhang, W.; Liu, S.; Zhou, Y. SP5: improving protein fold recognition by using torsion angle profiles and profile-based gap penalty model. PLoS. ONE, 2008, 3(6), e2325. Ganesan, K.; Parthasarathy. S. PSS-3D1D: An improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences. J. Struct. Funct. Genomics, 2011, 12(4), 181189. Russell, R.B.; Copley, R.R.; Barton, G.J. Protein fold prediction from secondary structure assignment, Proc. 29th Ann. Hawaii. Int. Conf. Sys. Sci., IEEE press. 1995, V, 302-311. Offmann, B.; Tyagi, M.; de Brevern, A.G. Local protein structures. Curr. Bioinf., 2007, 3, 165-202. de Brevern, A.G.; Etchebest, C.; Hazout, S. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins, 2000, 41, 271-287. de Brevern, A.G. New assessment of a structural alphabet. In. Silico. Biol., 2005, 5, 283-289. de Brevern, A.G.; Valadie, H.; Hazout, S.; Etchebest, C. Extension of a local backbone description using a structural alphabet: a new approach to the sequence-structure relationship. Protein Sci., 2002, 11, 2871-2886. de Brevern, A.G.; Etchebest, C.; Benros, C.; Hazout, S. "Pinning strategy": a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J. Biosci., 2007, 32, 51-70. Joseph, A.P.; Agarwal, G.; Mahajan, S.; Gelly, J.C.; Swapna, L.S.; Offmann, B.; Cadet, F.; Bornot, A.; Tyagi, M.; Valadié, H.; Schneider, B.; Etchebest, C.; Srinivasan, N.; de Brevern, A.G. A short survey on Protein Blocks. Biophys. Rev., 2010, 2(3), 137-145. Benros, C.; de Brevern, A.G.; Etchebest, C.; Hazout, S. Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins, 2006, 62, 865-880. Benros, C.; de Brevern, A.G.; Hazout, S. Analyzing the sequence structure relationship of a library of local structural prototypes. J. Theor. Biol., 2009, 256, 215-226. Bornot, A.; Etchebest, C.; de Brevern, A.G. A new prediction strategy for long local protein structures using an original description. Proteins, 2009, 76, 570-587.

Received: June 06, 2011

Revised: July 12, 2011

Accepted: August 01, 2011

[29]

[30] [31]

[32] [33]

[34] [35]

[36] [37]

[38]

[39]

[40]

[41]

[42]

[43] [44]

[45] [46]

[47] [48]

[49] [50] [51]

de Brevern, A.G.; Hazout, S. Hybrid Protein Model (HPM): a method to compact protein 3D-structures information and physicochemical properties. IEEE–Computer. Soc., 2000, S1, 49-54. de Brevern, A.G.; Hazout, S. 'Hybrid protein model' for optimally defining 3D protein structure fragments. Bioinformatics, 2003, 19, 345-353. Fourrier, L.; Benros, C.; de Brevern, A.G. Use of a structural alphabet for analysis of short loops connecting repetitive structures. BMC Bioinformatics, 2004, 5, 58. Tyagi, M.; Bornot, A.; Offmann, B.; de Brevern, A.G. Analysis of loop boundaries using different local structure assignment methods. Protein Sci., 2009, 18, 1869-1881. Tyagi, M.; Bornot, A.; Offmann, B.; de Brevern, A.G. Protein short loop prediction in terms of a structural alphabet. Comput. Biol. Chem., 2009, 33, 329-333. Faure, G.; Bornot, A.; de Brevern, A.G. Protein contacts, interresidue interactions and side-chain modelling. Biochimie, 2008, 90, 626-639. Dong, Q.W.; Wang, X.L.; Lin, L. Methods for optimizing the structure alphabet sequences of proteins. Comput. Biol., 2007, 37, 16101616. de Brevern, A.G. New opportunities to fight against infectious diseases and to identify pertinent drug targets with novel methodologies. Infect. Disord. Drug Targets, 2009, 9, 246-247. de Brevern, A.G.; Autin, L.; Colin, Y.; Bertrand, O.; Etchebest, C. In silico studies on DARC. Infect. Disord. Drug Targets, 2009, 9, 289-303. de Brevern, A.G.; Wong, H.; Tournamille, C.; Colin, Y.; Le Van Kim, C.; Etchebest, C. A structural model of a seventransmembrane helix receptor: the Duffy antigen/receptor for chemokine (DARC). Biochim. Biophys. Acta, 2005, 1724(3), 288306. Dudev, M.; Lim, C. Discovering structural motifs using a structural alphabet: application to magnesium-binding sites. BMC Bioinformatics, 2007, 8, 106. Etchebest, C.; Benros, C.; Bornot, A.; Camproux, A.C.; de Brevern, A.G. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur. Biophys. J., 2007, 36, 1059-1069. Tyagi, M.; Gowri, V.S.; Srinivasan, N.; de Brevern, A.G.; Offmann, B. A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications. Proteins, 2006, 65, 32-39. Tyagi, M.; Sharma, P.; Swamy, C.S.; Cadet, F.; Srinivasan, N.; de Brevern, A.G.; Offmann, B. Protein Block Expert (PBE): a webbased protein structure analysis server using a structural alphabet. Nucleic Acids Res., 2006, 34, W119-W123. Tyagi, M.; de Brevern, A.G.; Srinivasan, N.; Offmann, B. Protein structure mining using a structural alphabet. Proteins, 2008, 71, 920-937. Karchin, R.; Cline, M.; Mandel-Gutfreund, Y.; Karplus, K. Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins, 2003, 51, 504-514. http://bioinformatics.univ-reunion.fr/PBE/pb_prediction/ Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 1995, 247(4), 536-40. Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res., 2000, 28, 235-242. Kelley, L.A.; Sutcliffe, M.J. OLDERADO: on-line database of ensemble representatives and domains. Protein Sci., 1997, 6(12), 2628-2630. Pearson, W.R. Empirical statistical estimates for sequence similarity searches. J. Mol. Biol., 1998, 276, 71-84. Mitrophanov, A.Y.; Borodovsky, M. Statistical significance in biological sequence analysis. Briefings. Bioinf., 2006, 7, 2-24. Gribskov, M.; Robinson, N.L. Use of receiver operating characteristic (roc) analysis to evaluate sequence matching. Comput. Chem., 1996, 20, 25-33.

Suggest Documents