Articles Bioinformatics
July 2010 Vol.55 No.20: 2169–2174 doi: 10.1007/s11434-010-3199-z
SPECIAL TOPICS:
Evaluation of spatial epitope computational tools based on experimentally-confirmed dataset for protein antigens XU XiaoLian1,2, SUN Jing2,3, LIU Qi3, WANG XiaoJing2, XU TianLei2,3, ZHU RuiXin3, WU Di2,3* & CAO ZhiWei1,2,3* 1
State Key Laboratory of Bioreactor Engineering, East China University of Science & Technology, Shanghai 200237, China; Shanghai Center for Bioinformation Technology, Shanghai 200235, China; 3 School of Life Science and Technology, Tongji University, Shanghai 200292, China 2
Received September 1, 2009; accepted December 29, 2009
Antibody molecules interact with antigen proteins through the epitope area, where the epitope residues are found to be discontinuous or spatial or conformational rather than linear on the protein surface. There are various computational algorithms to predict the spatial epitopes, and each of them have an outstanding performance based on their individual testing dataset. In this work, an independent dataset was created through collection of the epitope residual sites which have been confirmed by experiments. Based on this dataset, 6 popular web-servers developed for B-cell structural epitope prediction, including SEPPA, CEP, DiscoTope, ElliPro, PEPOP and BEpro, were evaluated and compared according to sensitivity, the positive predictive value, the successful pick-up rate and the area under the curve of the receiver operator characteristic (AUC). The results showed that the general performance of spatial epitope prediction tools did obtain substantial advancement, and SEPPA gave the best performance among the 6 tools. However, the current prediction accuracy was still far from satisfaction. Moreover, our comparison elucidated that the performance of the web-servers was significantly affected by their training datasets and the algorithms adopted. In this sense, the results of our research may improve the design of B-cell epitope prediction tools and provide additional clues when the users utilize these tools in their related research. discontinuous epitope, conformational epitope, independent dataset, epitope prediction, protein antigen Citation:
Xu X L, Sun J, Liu Q, et al. Evaluation of spatial epitope computational tools based on experimentally-confirmed dataset for protein antigens. Chinese Sci Bull, 2010, 55: 2169−2174, doi: 10.1007/s11434-010-3199-z
As the protective mechanism for human beings, immune system is implicated in recognizing and defending the foreign antigens, where the adaptive immune system or antibody system is considered to be the dominate process [1]. Specific antibodies are developed gradually by the B-cell lymphocytes to specifically interact with and neutralize the corresponding antigens, while the recognition of antigens depends on a cluster of sites located on the antigen surface named the epitopes [2]. Among the different types of antigens, the protein antigen is the top one which has been intensively investigated and accumulated so far. Analysis of protein epitopes has become increasingly hot because of the *Corresponding authors (email:
[email protected];
[email protected])
© Science China Press and Springer-Verlag Berlin Heidelberg 2010
expectation to facilitate the design of monoclonal antibodies and even the novel vaccines especially at the current time of continuous out-break of newly emerging diseases. However, identification of epitope sites for protein antigens is highly time-consuming and labor-costing in wet-lab experiments. Therefore, it would be helpful if computational methods can be used to make prediction before the experiments are designed. Lots of efforts have been inputted into the field of epitope prediction, and some results have been reported for linear epitope predictions [2]. Nevertheless, more and more crystallographic studies for antigen-antibody complexes have shown that most of B-cell epitope residues, actually about up to 90%, in protein antigens are discontinuous [3], csb.scichina.com
www.springerlink.com
2170
XU XiaoLian, et al.
Chinese Sci Bull
which makes the spatial epitope prediction much more challenging than the linear epitope prediction. A few computational algorithms have been tried to predict the spatial/discontinuous epitopes with the accumulation of 3D structures of antibody-antigen complexes in PDB database [4]. Some popular and recent web-servers include CEP [5], DiscoTope [3], PEPOP [6], ElliPro [7], BEpro [8] and SEPPA [9] which make the prediction mainly based on the 3D coordinates of a protein antigen rather than the 1D sequence information. CEP was the first tool in the area of conformational epitope prediction mainly based on the solvent accessibility of surface residues in antigen proteins. Similarly, PEPOP applied an improved conformational feature of the solvent accessible surface cluster to predict discontinuous epitopes. Different from above, ElliPro simplified the surface of protein antigens firstly as an ellipsoid and calculated the protruding index for surface residues, then the protruding index was combined with the binding distance for residues in antigens to predict spatial epitopes. DiscoTope and BEpro gave more consideration regarding the character of spatial distribution and solvent accessibility of surface residues to achieve better prediction results. SEPPA, released in 2009, adopted the local spatial context on the protein surface and the spatial compactness of surface residues as two novel features to predict conformational epitope residues. With more and more computational tools available on-line while each of them claimed to give outstanding performance based on their own testing datasets, the normal users often feel difficult to choose the tools, especially when different tools give different results. Thus, it would be always desirable to give an evaluation to them. In 2007, an evaluation study [10] was done on CEP and DiscoTope based on the testing dataset derived from 59 structures of antigen-antibody complexes. The results of that analysis indicated that the performance of the methods exclusively for spatial epitopes prediction of protein antigens was totally out of expectation because the gap on prediction accuracy was insignificant when compared with the proteinprotein docking methods, such as PatchDock [11] and ClusPro [12]. Since then, 4 new prediction methods have been proposed for conformational epitope prediction, including ElliPro, BEpro, PEPOP and SEPPA. With their own testing datasets, these tools claimed that better and better performance had been achieved compared with the previous tools. However, the size and coverage of datasets adopted by different prediction tools were quite diverse. For instance, 39 structures of antigen-antibody complexes were selected for Ellipro; 76 structures were used for DiscoTope, while SEPPA collected 82 non-redundant structures representing 464 immune complexes. Considering the different sizes and overlapping extents between different datasets, it is difficult to compare their performance directly based on their own datasets. Did they really make improvement? How much improvement have they obtained? Are the latest algorithms
July (2010) Vol.55 No.20
already satisfying? In order to answer these questions, it is necessary to conduct an objective and systematic evaluation for these epitope prediction tools. It is noticeable that in the previous datasets, the spatial epitope residues were all derived from the crystal structures of antigen-antibody complexes by computational methods instead of experimental results. Since different computational parameters and cut-offs may produce certain difference to the residual dataset even for the same structure, this may generate false positives especially in the margin area of the interaction sites. To make a fair evaluation, an ideal testing dataset would contain those residues with each confirmed by the experimental results. Therefore, we have collected all the spatial epitope residues with experimental supports from either literature, or the Immune Epitope Database and other resources. After mapping them to the crystal structures of antibody-antigen complexes in the PDB database, an independent testing dataset of spatial eptiopes with experimental validation was established to further evaluate the performance of six computational tools.
1 Materials and methods 1.1
The independent testing dataset
The testing dataset was composed of experimentally validated spatial epitopes from literature, the Immune Epitope Database (IEDB, http://www.immuneepitope.org/) [13] and the Conformational Epitope Database (CED, http://web. kuicr.kyoto-u.ac.jp/~ced) [14], which related to 183 structures of antibody-antigen complexes. To get rid of the peptide antigen, those antigens shorter than 50 residues were removed from the dataset. 110 antigen-antibody complexes were retained as the testing dataset. The PDB IDs for these complexes are listed in the Supporting Information (Table S1). In these complexes, some antigens may have several epitope segments determined experimentally. In order to guarantee that the identical antibody has the unique discontinuous epitope for a protein antigen, several epitope segments in one complex were integrated and considered as a united spatial epitope. 110 patches of spatial epitope clusters were collected in the testing dataset, corresponding to the 110 antigen-antibody complexes. 1.2 Prediction tools for B-cell spatial epitopes of protein antigens Six prediction tools for B-cell spatial epitopes of protein antigens have been selected for evaluation, including CEP, PEPOP, SEPPA, DiscoTope, ElliPro and BEpro. These methods are divided into two categories according to the way of the results being given. One includes SEPPA, DiscoTope and BEpro which assign a score to each potential epitope residue on the surface of protein antigens. Their results mainly focus on the individual epitope residues. The
XU XiaoLian, et al.
Chinese Sci Bull
other one includes CEP, PEPOP and ElliPro which detect several potential epitope patches based on the generalized features of residual clusters on protein surfaces. The results of these tools are given as the cluster or patch of conformational epitope residues. 1.3
Parameters used for evaluation
Considering the limitation of the existing experimental methods and results, it is difficult to determine whether the potential epitope area on the protein surface is completely detected or not. Thus the accuracy of prediction results is judged only through the experimental data being collected so far. As to the residues located on the surface of protein antigens, each web-server will give the results of positive (Yes) or negative (No) prediction to be an epitope residue. The positive predictions are considered as the true positive results (TP) if they have been confirmed by experimental results, and the others without confirmation of experimental results are considered as the false positive predictions (FP). Similarly, the negative predictions are also divided into true negative and false negative results. Noting the result difference from the two kinds of prediction tools, the numbers of TP, FP, FN, and TN for prediction results are calculated with different methods to avoid bias against the distinct outputs. As to BEpro, DiscoTope and SEPPA, the prediction results for individual residues are compared with experimental data to determine the accuracy directly. As to CEP, PEPOP and ElliPro, TP, FP, FN and TN are determined for every patch of epitope based on residues firstly. Then, the average values of these parameters are calculated based on all patches. Sensitivity, specificity, the positive predictive value, the successful pick-up rate and the area under the receiver operating characteristic curve (AUC) are calculated based on TP, FP, FN and TN [15] as follows. (i) Sensitivity (Se). A proportion of the correctly predicted epitope residues (TP) to the number of real epitope residues (TP+FN). Se measures the capacity of the tool to identify true epitope residues: Sensitivity = TP/(TP + FN).
(1)
(ii) Specificity (Sp). A proportion of the correctly predicted non-epitope residues (TN) to the number of real non-epitope residues (TN + FP). Sp measures the capacity of the tool to identify non-epitope residues: Specificity = TN/(TN + FP).
(2)
(iii) Positive predictive value (PPV). A proportion of the correctly predicted epitope residues (TP) to the number of predicted epitope residues (TP + FN): Positive predictive value = TP/(TP + FP).
(3)
(iv) Successful pick-up rate. Based on TP, FP, FN and TN, the statistical difference is determined with Fisher’s
July (2010) Vol.55 No.20
2171
exact test (right-tailed) [10]. The prediction result is of statistical significance when the P-value is less than 0.05, and such results are considered as the successful pick-up. The rate of successful pick-up for the prediction method is defined as the proportion of the number of prediction results with statistical significance to the total number of spatial epitopes in the testing dataset. (v) Receiver operating characteristic curve (ROC). A ROC curve is often used to evaluate the overall performance of prediction tools. Owing to the character of the testing dataset and prediction results, epitope prediction tools are categorized as the discrete classifier which produces a discontinuous curve for the testing dataset. Consequently, the ROC curve is not produced with prediction results directly. Since the ROC curve and the AUC value are highly correlated, the estimated AUC value is used to substitute the ROC curve for comparing the performance between the six tools. The estimated AUC value [10] is calculated as: AUC value = 0.5×(x + y),
(4)
where y is the sensitivity and x is the specificity. In this work, the default thresholds are selected for the prediction tools, including 0.7 and 6 Å for the protrusion index of surface residues and the distance for neighboring residues for ElliPro [7], and the threshold of –7.7, 1.8 and 1.3 for DiscoTope [3], SEPPA [9] and BEpro [8], respectively.
2 2.1
Result Analysis of the testing dataset for spatial epitopes
There are 110 spatial epitopes included in the testing dataset, which contains 2281 epitope residues. As to the size of spatial epitopes, the number of residues included in the real epitopes varies from 5 to 67, and the average number is 25. Among the conformational epitopes, there are 65 consisting of 10 to 30 residues, corresponding to 59.1% of 110 spatial epitopes in the dataset. The smallest one is the epitope of HEL recognized by HyHEL-10 (PDB ID: 3HFM) which includes 5 residues only. The antigenic site of this protein antigen is located at a cleft which accommodates a polysaccharide substrate around the disulphide bond. Possibly due to the space constraint, only 5 residues on the surface of the HEL antigen were determined active for the binding with its antibody by ‘surface-simulation’ synthesis [16]. Considering that the previous epitope residues were all derived from the crystal structures of antigen-antibody complexes automatically by computational methods for each tool, the overlapping structures between the previous and current datasets were analyzed with the sequence similarity above 90%. Then, the epitope residues under the computational definition were compared with the ones under the experimental detection in the overlapping protein antigens. Among the 76 antigen-antibody complexes used
2172
XU XiaoLian, et al.
Chinese Sci Bull
by DiscoTope, 60 structures were also included in the testing dataset. In the dataset of DiscoTope, 1041 epitope residues were computationally defined for these 60 complexes, while in our current dataset, only 923 epitope residues were found with experimental support, accounting for 88.7%. As for the 39 complexes in ElliPro, there were 23 structures common to the testing dataset. 317 residues were computationally defined as epitope residues in the 23 structures and 251 residues were experimentally determined, accounting for 79.2%. SEPPA used 82 representative antigen-antibody complexes as the training dataset, of which 69 structures were overlapping with the testing dataset. 1143 residues were computationally defined, of which 1028 residues belonged to the testing dataset, accounting for 89.9%. 2.2 Performance evaluation of B-cell spatial epitope prediction tools Based on the 110 spatial epitopes in the dataset, the overall performance of all methods has been evaluated with the averaged AUC values. The results are presented in Figure 1 (further details are shown in Table S2). In general, the averaged AUC value closer to 0.5 indicates a poorer performance of the prediction [17]. As shown in Figure 1, the averaged AUC value for each prediction tool ranged from 0.50 to 0.62. SEPPA gave the best performance among the six tools based on the current testing dataset, and the averaged AUC value of 0.62 was achieved. DiscoTope and BEpro respectively obtained the averaged AUC value of 0.58 and 0.55. The averaged AUC values for CEP, PEPOP and ElliPro did not exceed 0.55. These results indicated that the recently developed tools did gain advancement compared to those previous peers, especially SEPPA and DiscoTope. This achievement could be attributed to not only more structural data accumulated, but also the new spatial features incorporated. However, if a satisfying AUC value is usually defined as above 0.70 [17], there is still space for all algorithms to further improve.
July (2010) Vol.55 No.20
As being mentioned above, it is difficult to determine whether all potential spatial epitopes on the surface of protein antigens have been detected or not due to various limitations. Therefore, there may be bias existing to evaluate the programs only via the AUC values estimated with sensitivity and specificity which will be affected by the true negative (TN) and the false positive (FP) included in predictions. In the previous studies, sensitivity and the positive predictive value (PPV) [6–10] have also been used to evaluate the performance of conformational epitope prediction tools. The main reason is due to that using the positive predictive value could avoid the interference of true negative predictions. Hence, sensitivities and the positive predictive values have also been calculated according to formulae (1) and (3) for further evaluation of the capacity of a tool to recognize the spatial epitope (Tables S1 and S3). The results indicated that sensitivities and the positive predictive values for 6 methods were also unfavorable, similar to the results of AUC. The best performance among the 6 methods was 0.49 of sensitivity and 0.27 of the positive predictive value (P < 2.2×10–16, Fisher exact test) for SEPPA. Sensitivities and the positive predictive values for other methods were lower than 0.40 and 0.25 (Table 1). Such performance is much lower than the claimed value based on their individual datasets for each method. For instance, SEPPA [9] claimed the sensitivity of 0.580 under its default threshold of 1.80, DiscoTope [3] claimed the sensitivity of 0.47 and ElliPro [7] reported the sensitivity of 0.10 with their own datasets. On the other hand, it would be necessary to compare the alternative ability of picking up true epitope residues from the random surface residues by the tools. The successful pick-up rates for the 6 prediction tools were calculated based on the P-value determined by Fisher’s exact test (right-tailed). The results are presented in Table 1 and the Supporting Information (Table S4). The ability of all methods to pick up the real epitope residues from random selection among surface residues was also not satisfying, with the highest successful pick-up rate not exceeding 60%. SEPPA achieved the best performance among all tools with the successful pick-up rate of 55.5%. DiscoTope achieved the successful pick-up rate of 40% and BEpro achieved 28.20%. Surprisingly, the successful pick-up rates of CEP, PEPOP and ElliPro were less than 10%. Table 1 tools
Evaluation and comparison of the performance of prediction
SEPPA DiscoTope BEpro CEP
0.4914 0.3565 0.1789 0.1774
Positive predictive value 0.2650 0.2116 0.2205 0.1720
PEPOP ElliPro
0.1973 0.0676
0.1946 0.1580
Methods Sensitivity
Figure 1
The averaged AUC value of prediction tools.
The successful pick-up rate (%) 55.50 40.00 28.20 8.18 2.73 3.64
XU XiaoLian, et al.
Chinese Sci Bull
Based on the evaluations summarized above, SEPPA, DiscoTope and BEpro out-performed CEP, PEPOP and ElliPro. This may be partially caused by the different representations of prediction results, and the outputs of CEP, PEPOP and ElliPro are possibly more accordant with the real antigenic epitopes. However, more false positive predictions may be contained in such prediction results.
3 Discussion 3.1
The influence of testing dataset on prediction
The size and variety of the training and testing datasets significantly affect the accuracy of the final prediction. The results have shown that the performance of all the web-servers tested was far from satisfying based on the experimentally-confirmed dataset. One reason for this may lie in the limited number of antibody-antigen complexes included in the original dataset each method used. Secondly, the diversity of epitope features may not be extensively extracted if the training data of antigen epitope is insufficient. Under such situation, the parameters selected by computational tools may not be able to completely describe the intrinsic character profile of conformational epitopes. SEPPA out-performed all other tools under different rounds of evaluation, which possibly benefited from the use of the biggest and the most representative dataset containing non-redundant epitopes from 82 antigen-antibody complexes. Therefore, it is expected to be helpful to include more representative epitopes data for the prediction tools to achieve a higher accuracy. On the other hand, the accuracy of the epitope data also affects the performance of prediction tools. In previous computational methods, epitope residues were directly derived from the structure of the antigen-antibody complex through the solvent accessible surface area (SASA) change between the unbound and bound antigen proteins, as being introduced in SEPPA of ΔSASA≥1 Å2 [9]; or a distance cutoff between the atoms of antigen and antibody residues in a complex, such as 4 Å in DiscoTope [3]. Although a comprehensive structural view of epitopes can be generated with these two definitions, some residues will be calculated as epitope residues but they may not be directly involved into the interactions with antibody molecules. This could lead to the discrepancy of epitope residues between the experimental and computational definition [6]. In this research, we compared the datasets computationally derived and experimentally confirmed. The proportion of the real epitope residues with the experimental support was only about 80% of the computationally derived. Although it is difficult to judge whether the remaining computational epitope residues in datasets are the real ones or not, including them into the training data will likely affect the accuracy of prediction. In this research, the testing data of 110 epitopes was created from the experimental results which have excluded the false
July (2010) Vol.55 No.20
2173
positives to the highest extent and consequently given a more impartial evaluation. 3.2
The influence of algorithms on prediction
Different algorithms and parameters of prediction methods also affect the accuracy of results, especially the various features of epitope residues. Such issue has been noticed and pointed by Bourne that more efficient features, such as the evolutionary conservation score, may improve the accuracy of epitope prediction greatly [10]. With more prediction methods available, some novel features have been applied to characterize the spatial epitopes. However, the improvement of prediction accuracy is not as inspiring as we expected. The analysis has shown that the current features are still not enough to completely decipher the intrinsic characteristics of antigenic epitopes, although several features related to the antigenicity of antigens, such as the solvent accessibility of residues, spatial distributions, and the combination of amino acid propensities, have been incorporated. For example, the short consecutive peptide on a sequence is selected to calculate the Log-odds for epitope residues in DiscoTope. Such calculation treats every residue within the peptide with equal significance. Apparently, non-epitope residues in the consecutive peptide may reduce the relevance of the Log-odds and the antigenicity scores. The number of contacting Ca atoms is used to determine the surface exposure for each residue in DiscoTope, which will not be applicable for the surface residues in some membrane proteins (PDB ID: 1E08). As to ElliPro, two concepts have been introduced. Firstly, the regions protruding from the globular surface of protein antigens easily interact with antibodies [18]. Secondly, the protein is considered as a simple ellipsoid to determine the protrusions [19]. However, these concepts are not applicable to the proteins with multiple domains or large single-domains with the conformational change. BEpro, CEP and PEPOP are based on the detection of exposed residues, but residues buried in the structure are ignored which may be involved in the interactions with antibodies as well, and this may affect the reliability of predictions [6]. CEP may make false positive predictions when the relatively large binding sites for complexes of lysozyme and neuraminidase are tested [5]. Although SEPPA describes the local spatial context and the spatial compactness of surface residues to predict conformational epitopes, these features are still waiting to be refined to delicately discriminate the epitope residues from other surface ones. To such a complicated process as molecular recognition of antibodies for antigens, more comprehensive features are awaited to be formulated in order to interpret the chemical properties and structural characteristics involved.
4
Conclusion
In this work, conformational epitopes validated experimen-
2174
XU XiaoLian, et al.
Chinese Sci Bull
tally were collected into an independent testing dataset for the current and future evaluation of computational programs for the B-cell discontinuous epitope prediction for protein antigens. The overall performance of 6 popular and recent tools was evaluated and compared through sensitivity, the true positive predictive value, the successful pick-up rate and the averaged AUC value based on this dataset. The results showed that: (i) The general performance of epitope prediction tools has been substantially improved; (ii) the performance of SEPPA, DiscoTope and BEpro is better than that of CEP, PEPOP and ElliPro, where SEPPA out-performs all other tools under different rounds of evaluation; (iii) in spite of that, there is still space to improve the prediction for the more satisfying results. Further efforts could be input into constructing the high-quality training and testing datasets and formulating more delicate spatial features, such as the local environment of the epitope area, cooperative effects of residues, and the structural effects upon complex formation. This work may help to improve the design of B-cell epitope prediction tools and provide additional clues when the users utilize these programs in their research.
July (2010) Vol.55 No.20
4 5
6 7
8
9
10
11
12
13
14 This work was supported by the National Basic Research Program of China (2010CB833601 and 2006AA02312), Shanghai Education Development Foundation (2000236018 and 2000236016), Young Excellent Talents in Tongji University (2008KJ073) and Shanghai Municipal Natural Science Foundation (07ZR14085).
15
16
17 1 2
3
Korber B, LaBute M, Yusim K. Immunoinformatics Comes of Age. Plos Comput Biol, 2006, 2: 6484–6492 Greenbaum J A, Andersen P H, Blythe M, et al. Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools. J Mol Recognit, 2007, 20: 75–82 Haste-Andersen P, Nielsen M, Lund O. Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci,
18
19
2006, 15: 2558–2567 Berman H M, Westbrook J, Feng Z, et al. The Protein Data Bank. Nucleic Acids Res, 2000, 28: 235–242 Kulkarni-Kale U, Bhosle S, Kolaskar A S. CEP: A conformational epitope prediction server. Nucleic Acids Res, 2005, 33(Web Server issue): W168–W171 Moreau V, Fleury C, Piquer D, et al. PEPOP: Computational design of immunogenic peptides. BMC Bioinformatics, 2008, 9: 71–86 Ponomarenko J, Bui H H, Li W, et al. ElliPro: A new structure-based tool for the prediction of antibody epitopes. BMC Bioinformatics, 2008, 9: 514–522 Sweredoski M J, Baldi P. PEPITO: Improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure. Bioinformatics, 2008, 24: 1459–1460 Sun J, Wu D, Cao Z W, et al. SEPPA: A computational server for spatial epitope prediction of protein antigens. Nucleic Acids Res, 2009, 37(Web Server issue): W612–616 Ponomarenko J V, Bourne P E. Antibody-protein interactions: Benchmark datasets and prediction tools evaluation. BMC Struct Biol, 2007, 7: 64–83 Schneidman-Duhovny D, Inbar Y, Polak V, et al. Taking geometry to its edge: Fast unbound rigid (and hinge-bent) docking. Proteins, 2003, 52: 107–112 Comeau S R, Gatchell D W, Vajda S, et al. ClusPro: An automated docking and discrimination method for the prediction of protein complexes. Bioinformatics, 2004, 20: 45–50 Zhang Q, Wang P, Kim Y, et al. Immune epitope database analysis resource (IEDB-AR). Nucleic Acids Res, 2008, 36(Web Server issue): W513–518 Huang J, Honda W. CED: A conformational epitope database. BMC Immunol, 2006, 7: 7–15 Sonego P, Kocsor A, Pongor S. ROC analysis: Applications to the classification of biological sequences and 3D structures. Brief Bioinform, 2008, 9: 198–209 Atassi M Z, Lee C L. Boundary refinement of the lysozyme antigenic site around the disulphide bond 6–127 (site 1) by ‘surface-simulation’ synthesis. Biochem J, 1978, 171: 419–427 Huang J, Honda W, Kanehisa M. Predicting B cell epitope residues with network topology based amino acid indices. Genome Inform, 2007, 19: 40–49 Thornton J M, Edwards M S, Taylor W R, et al. Location of ‘continuous’ antigenic determinants in the protruding regions of proteins. Embo J, 1986, 5: 409–413 Taylor W R, Thornton J M, Turnell W G. An ellipsoidal approximation of protein shape. Mol Graph, 1990, 1: 30–38
Supporting Information Table S1
PDB IDs for antigen-antibody complexes included in the testing dataset
Table S2
AUC valuese for prediction tools
Table S3
Sensitivities and the positive predictive values(PPV) for prediction tools
Table S4
Statistical significance of prediction results and the successful pick-up rates for prediction tools
The supporting information is available online at csb.scichina.com and www.springerlink.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.