International Journal of Digital Content Technology and its Applications Volume 3, Number 2, June 2009
Generating Rules for Predicting MHC Class I Binding Peptide using ANN and Knowledge-based GA Yeon-Jin Cho1, Hyeoncheol Kim*1 and Heung-bum OH2 *1 Dept. of Computer Science Education, Korea University, Seoul, 136-701, Korea 2 Dept. of Laboratory Medicine, University of Ulsan and Asan Medical Center, Seoul, 138-736, Korea {jx, hkim}@comedu.korea.ac.kr,
[email protected] doi: 10.4156/jdcta.vol3.issue2.cho
Abstract
response T lymphocytes evoke. Therefore, prediction of MHC-binding properties is very useful for the design of efficient peptide vaccines that elicit strong immune response against foreign antigens [1, 8, 11]. Moreover, accurate prediction for binding peptides can substantially reduce the cost of synthesizing and testing candidate binders because only about one percent of potential binders actually bind to MHC molecule [6, 13]. Methods of predicting MHC-peptide binding are divided into two groups: sequence-based and structurebased methods. The latter predicts peptide binding based on the fitness of a peptide to the groove of MHC molecule [13]. Limited to MHC molecules with known structures, the sequence-based approach can exploit growing body of information on peptide binders [18, 19]. The former includes profile-based prediction such as SYFPEITHI and HLA-BIND. Several machine learning methods have been applied to the prediction problem at hand, such as an ANN (artificial neural network), a HMM (hidden Markov model) and a SVM (support vector machine). Honeyman, et al. [8, 17] used ANN-based methods and showed prediction accuracy of roughly 80% sensitivity and 80% specificity. Mamitsuka [6] used a single HMM for modeling MHC-peptide interactions for multiple alleles of HLA-A2, and showed similar accuracy as ANN. A SVM (Pierre, et al.) showed great performance of the specificity of 90% [13]. These machine learning algorithms have been considered in general to show better performance than the profilebased methods because the former uses information for both binders and non-binders [6, 13, 18] while the latter only binders. Although some available prediction programs for particular MHC alleles have reasonable accuracy, there is no guarantee that all models produce good quality predictions [18]. In this article, we introduce a new method to generate rules to predict peptides bound to MHC class I. Our approach involves neural network training using available data sets, knowledge extraction from the
Cytotoxic T cells recognize specific peptides bound to major histocompatibility complex (MHC) class I molecule. Accurate prediction for the binding peptides could be of much use for the design of efficient peptide vaccines, which substantially reduce the cost of synthesizing and testing candidate binders. In this paper, we demonstrated that a machine learning approach can be successfully applied to extract rules to predict MHC class I binding peptides. We introduce a new method using a feed-forward neural network and genetic algorithm, and show that the proposed method outperforms other methods in both quantity and quality of the prediction rules. In order to verify the rules generated by our method, we compared them with the known-rules available in the HLA FactBook. Our method successfully identified most of the known-rules, and found some new additional rules for HLA-A*0204 and HLA-B*2706. We also found new rules for HLA-A*3301 for which no rules have ever been reported before.
Keywords
MHC class I peptide prediction, Knowledge-Based Genetic Algorithm, ANN 1
1. Introduction T-cell mediated immune responses are triggered through peptide recognition by T lymphocytes. Among T lymphocytes, cytotoxic T-cells recognize specific peptides bound to major histocompatibility complex (MHC) class I molecule to provoke cytotoxic immune response. Cytotoxic immune response involves the eradication of virus-infected cells. It is generally accepted that the stronger the binding is between a peptide and an MHC allele, the stronger immune *
To whom correspondence should be addressed.
111
Generating Rules for Predicting MHC Class I Binding Peptide using ANN and Knowledge-based GA Yeon-Jin Cho, Hyeoncheol Kim and Heung-bum OH trained neural network and rule exploration by genetic evolution using the neural network knowledge. We extended our previous intermediate research [20] by adding much more datasets for experiments, using several more machine learning algorithms for performance comparison, and validating our results with well-known motifs in HLA factsBook. The performance of our method is compared with various computational methods including Sequence logo, DT (Decision Tree), NN (Neural Network), SVM and GA, which have frequently been used as data classifiers in Machine Learning [2, 12].
Amino acid conservation in multiple sequence alignments is visualized using sequence logo. Sequence logo represents relative frequency and information in each point of protein and basic ranks. It is useful for short ranks, such as protein motif [3]. In Figure 1, the sum of row heights represents the amount of information that ranks possess and the font size shows occurrence frequency. However, in case that the given rank does not reflect the entire information, the accurate prediction of the position becomes infeasible and it is not suitable for the judgment of molecular peculiarity.
2. Data set and methods 2.1 Data Set Preparation We used the same datasets that were originally used by the experiments done by Pierre, et al. [13] in order to make the performance comparison easy. They used 6 MHC alleles of SYFPEITHI database and 32 MHC alleles of MHCPEP database (The dataset were kindly made available by the authors through personal communication). SYFPEITHI is a database comprising more than 4500 peptide sequences known to bind class I and class II MHC molecules. It is a high quality database based on only published data containing sequences of natural ligands to T-cell epitopes [4]. MHCPEP database contains over 13000 peptide sequences known to bind MHC molecules compiled from published and directly submitted experimental data [16]. MHCPEP contains more data, while the data in SYFPEITHI is generally believed to be more accurate. As was done in Pierre's research [13], duplicated data were removed and the number of binder data was maintained to be more than 20 at all time. Protein, randomly extracted from ENSEMBL database, was cut in a fixed size and all the sequences, defined in MHCpeptide database, were removed from it, in order to get non-binders. 20 binders and 40 non-binders are used, maintaining the binder to non-binder ratio to 1:2. For neural network training, each region value that is one of 20 amino acids is converted into 20 binary digits. For example, an amino acid Alanine (A) is represented by a string of 20 bits (1000000000000000 0000) [8]. Thus, each MHC-peptide sequence is encoded into 180 bits (9-mer peptide * 20) or 200 bits (10-mer peptide * 20). Class is encoded into either 1 (i.e., binder) or 0 (i.e., non-binder).
Figure1. The sequence logos of MHC-peptides Decision tree is one of the best-known classification techniques in symbolic machine learning. It uses a heuristic method to select the attribute that separates the samples most effectively into each class using an entropy-based measure, information gain [7]. We used Quinlan's C5.0 algorithm and generated the following: "if L@P2 (HLA-A*0201), then MHC-binder". 2.2.2 Feed-forward Neural Network and Rule Extraction Honeyman, et al. [8] used feed-forward neural networks for prediction of candidate T-cell epitopes. They showed that one advantage of machine learning algorithms is higher specificity compared to profile methods. However, they used a neural network just for the prediction of MHC-peptide binding. In this paper, we extract If-Then rules from the neural networks and then compare the generated rules to the well-known rule base such as the HLA factsBook. Our experiments include followings: − Training of a feed-forward neural network with known MHC class I binding peptide data. − Extraction of If-Then rules from the trained neural network. − Performance comparison of the extracted rules with well-known motifs HLA factsBook. We used de-compositional approach for the rule extraction from a trained neural network which involves the following phases: 1. Intermediate rules are extracted at the level of individual units within the network.
2.2 Methods for rule extraction 2.2.1 Sequence Logo and Decision Tree
112
International Journal of Digital Content Technology and its Applications Volume 3, Number 2, June 2009 2. The intermediate rules from each unit are aggregated to form a composite rule base for the neural network. It rewrites rules to eliminate symbols which refer to hidden units but are not predefined in the domain. There have been many studies for efficient extraction of valid and general rules from a trained neural network. In this paper, we used the OAS (Ordered-Attribute Search) algorithm to extract if-then rules from the neural network [5].
decision of how to place the numeric values and the letters in a chromosome is made based on knowledge. A chromosome in the initial population set can be expressed as a string of 20 amino acid and '*' (don't care) symbols. Therefore the size of rule space is as large as 219 (or 2110). The GA-based model searches for the best fitted set of chromosomes (i.e., rules) among as 219 (or 2110) candidates. We want to find rules that are both maximally general and high accuracy. Therefore we define our fitness function with the terms of sensitivity (or accuracy) and rule generalization as follows:
2.2.3 Rule Exploration using a Genetic Evolution Method
f(n) = (TP / (TP + FP + 1))ⅹ100 + d
One of the research areas where genetic algorithm (GA) becomes increasingly popular is the data interpretation and prediction in molecular biology. Genetic algorithm has been used to interpret Nuclear Magnetic Resonance (NMR) data that explains DNA structures. Researches to utilize Genetic algorithm for the prediction assignments such as the protein structure prediction are in progress [9, 21]. GA consists of a fitness function that evaluates the fitness of the environment, a chromosome that shows the potential solutions and genetic operators that exchange information among chromosomes while forming a new generation. The best solution can be found by improving chromosomes gradually. A Classifier rule can be found with GA by representing a chromosome with 20 amino acid symbols and a ‘*’ (don’t care) symbol. A more fitted chromosome represents a rule with a higher sensitivity (or accuracy) value. However, in general GA rule performance is not stable because it depends on initial population which is usually randomly selected. Therefore, a new method based on many theories, which can be applied to GA procedures, and knowledge from experience is required. To overcome the limitations of simple genetic algorithm, we used a Knowledge-Based Genetic Algorithm to solve the problem. Traditional GA depends on the experience of the designer when setting up the initial population because it lacks regular methods for the character allocation and the coding. And it is likely to overlook some of the knowledge and the idea that are potentially useful because every computation progresses randomly. Consequently the techniques that utilize many feasible theories and knowledge are being suggested in order to improve the performance of GA. KBGA (Knowledge-Based Genetic Algorithms) for classification that initializes the population using the knowledge trained in NN is suggested in this research. It is to investigate the best-fitted value efficiently by obtaining superior characteristic of a gene. The
where TP (or FP) is the number of true positive (or false positive) instances matched by the chromosome rule and d is the number of *s (i.e., don't care symbols) in the chromosome. The population is sorted by the fitness and is selected as the parent generation for the next generation using roulette-wheel sampling. The offspring are reproduced by 1-point crossover and 0.01 % mutation.
3. Experimental results and discussion Our experiment includes the following steps: (1) rule extraction from a decision tree and a trained neural network; (2) rule generation by GA-based model using neural network rules for an initial population; (3) comparison of the rules from decision trees, neural networks, KBGA and well-known motifs in HLA factsBook. A rule is in the form of "IF condition, THEN class" where class is either of MHC-binder or non-binder. Performance of a rule is evaluated by its sensitivity (SE, true positive rate) defined as follows: Sensitivity = TP / (FP + TP) A feed-forward neural network was configured with 180 (or 200) input nodes, 4 hidden nodes and 1 output node. The neural network was trained and tested by 3fold cross-validation (the number of binder data was maintained to be more than 20 at all time) [13]. We tried different number of hidden nodes and found that 4 hidden nodes produced the best results. Then we extracted if-then rules from the trained neural network using OAS algorithm, and compared them with the data reported in HLA facts book. We extracted best rules from a trained neural network and then enhanced the extracted rules by genetic evolution, which are shown in Table 1. Rules of “R@P9”, “K@P9”, “[V@P2/F@P3] ∧ R@P9”, “L@P2 ∧ K@P9” were generated from HLA-
113
Generating Rules for Predicting MHC Class I Binding Peptide using ANN and Knowledge-based GA Yeon-Jin Cho, Hyeoncheol Kim and Heung-bum OH A*3301 that has never been reported to HLA-facts before. The rules outperformed the known HLA facts. Coverage was reasonably high with very high accuracy. Next, GA-based model was used with domain knowledge incorporated initially. Domain knowledge was obtained by extracting rules from neural networks. Finally, we compare the rule performances between decision tree, neural network and knowledge-based genetic algorithm.
generated by our system allow users to understand easily MHC binding peptides. Thus it is also thought that grouping and clustering of HLA alleles may be possible according to binding rules, which may be called a kind of rule mining. To our knowledge, this work is the first using KGBA to predict the MHC-peptide binding. Our KBGA method incorporates knowledge extracted from ANN into the process of genetic algorithm. This study revealed that the KBGA can improve the performance of the GA as the KBGA was the best performer followed by ANN and then DT. This finding indicates that KBGA would be useful approach for the analysis of bio-data.
Table 1. The performance of the rules extracted from each classifier algorithm HLA-A*3301 Standard Sensitivity +80%. Classifier
Positive Rule
HLA Fact
N/A
DT Rule ANN Rule
KBGA Rule
R@P9 K@P9 V @P2 ^ R@P9 F@P3 ^ R@P9 R@P9 K@P9 V @P2 ^ R@P9 F@P3 ^ R@P9 L@P2 ^ K@P9
Motif
P/FP (Sensitivity)
********R ********K
19/4(82.6%) 13/2(86.7%)
*V******R
6/0(100%)
**F*****R ********R ********K
5/0(100%) 19/4(82.6%) 13/2(86.7%)
*V******R
6/0(100%)
**F*****R *L******K
5/0(100%) 5/0(100%)
Acknowledgement
This work was supported by the Korea Science and Engineering Foundation(KOSEF) grant funded by the Korea government(MOST) (No. R13-2003-016-050020(2008))
5. References [1] A. Logean, D. Rognan, Recovery of known T-cell epitopes by computational scanning of a viral genome. Journal of Computer-Aided Molecular Design 16(4), 229-243, 2002. [2] A. Narayanan, E.C. Keedwell, B. Olsson, Artificial Intelligence Techniques for Bioinformatics. Bioinformatics. 1(4), 191-222, 2002. [3] GE. Crooks, G. Hon, JM. Chandonia, SE. Brenner, WebLogo: A Sequence Logo Generator. Genome Research. 1188-1190, 2004. http://weblogo.berkeley.edu [4] H.G. Rammensee, J. Bachmann, N.P. Emmerich, O.A. Bachor and S. Stevanovic, SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics. Vol. 50, 213-219, 1999. [5] H. Kim, Computationally Efficient Heuristics for IfThen Rule Ex-traction from Feed-Forward Neural Networks. Lecture Notes in Artificial Intelligence. Vol. 1967, 170-182, 2000. [6] H. Mamitsuka, MHC molecules using supervised learning of hidden Markov models. Proteins: Structure, Function and Genetics. 33, 460-474, 1998. [7] J.R. Quinlan, C5.0 Online Tutorial, 2003. http://www. rulequest.com. [8] M. Honeyman, V. Brusic, N. Stone, L. Harrison, Neural network-based prediction of candidate t-cell epitopes. Nature Biotechnology. 966-969, 1998. [9] M. Mitchell, An Introduction to Genetic Algorithms. MIT Press. 1996. [10] N. Cristianini, J. Shawe-Taylor, Support vector machines and other kernel-based learning methods. Cambridge University Press. 2000. [11] O. Schueler-Furman, Y. Altuvia, A. Sette, H. Margalit, (2000). Structure-based prediction of binding peptides to MHC class I molecules: application to a broad range of MHC alleles. Protein Sci. 9(9), 1838-1846.
Results extracted from 6 alleles of SYFPEITHI DB and 32 alleles of MHCPEP DB are shown at appendix (Table 2). The table shows only 4 digit alleles from the total alleles. Table 2 lists the rules with Sensitivity greater than 70%. Some of the rules are shaded indicating the rules identified only with the given algorithm.
4. Conclusion We extracted rules for predicting peptides binding to MHC class I using decision tree (DT), neural networks (NN) and knowledge based genetic algorithm (KBGA). IF-THEN rules were extracted using the information of 6 HLA alleles from SYFPEITHI database and 32 alleles from MHCPEP database. When compared all the rules extracted in this study and together with previously reported in the literature [14], it was found that the KBGA generates more rules than others. For example, new rules which have never been reported to date were generated by the KBGA, such as “V@P9” for HLA-A*0204 and “G@P1” for HLA-B*2706 and “R@P9”, “K@P9”, “[V@P2/F@P3] ∧ R@P9”, “L@P2 ∧ K@P9” for HLA-A*3301. Prediction rules
114
International Journal of Digital Content Technology and its Applications Volume 3, Number 2, June 2009 [12] P. Baldi, S. Brunak, Bioinformatics, the machine learning ap-proach. MIT Press Cambridge Massachusetts, London England. 2004. [13] P. Donnes, A. Elofsson, Prediction of MHC class I binding pep-tides, using SVMHC. BMC Bioinformatics. 3:25, 2002. [14] S.G.E. Marsh, P. Parham and L.D. Barber, The HLA Facts Book. London, Academic Press. 2000. [15] T. Joachims, SVMlight 6.01 Online Tutorial. 12, 2004. http://svmlight.joachims.org [16] V. Brusic, G. Rudy, LC. Harrsison, MHCPEP, a database of MHC-binding peptides: update 1997. Nucleic Acids Research. Vol. 26, 368-371, 1998. [17] V. Brusic, G. Rudy, M. Honeyman, J. Hammer and L. Harrison, Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network. Bioinformatics. 14(2), 121-130, 1998. [18] V. Brusic, V.B. Bajic, N. Petrovsky, Computational methods for prediction of T-cell epitopes-a framework for modelling, testing, and appli-cations. Elsevier Inc, Science Direct. 436-443, 2004. [19] Y. Altuvia, H. Margalit, A structure-based approach for predic-tion of MHC-binding pep-tides. Elsevier Inc, Science Direct 454-459, 2004. [20] YJ. Cho, H. Kim, HB. O, Prediction Rule Generation of MHC Class I Binding Peptides using ANN and GA. Vol. 3610, 1009-1016, Springer-Verlag, ICNC2005. [21] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolu-tion Programs. Springer-Verlag, Berlin Heidelberg New York. 1996.
115
Generating Rules for Predicting MHC Class I Binding Peptide using ANN and Knowledge-based GA Yeon-Jin Cho, Hyeoncheol Kim and Heung-bum OH
APPENDIX Table 2. The performance of the rules extracted from each classifier algorithm. ▸HLA-A1 Standard Sensitivity + 75%. Classifier Positive Rule Motif HLA-Fact N/A Y@P9 ********Y DT_Rule D@P3 ^ Y@P9 **D*****Y ANN_Rule P@P4 ^ Y@P9 ***P****Y T@P2 ^ Y@P9 *T******Y Y@P9 ********Y D@P3 **D****** KBGA_Rule Y@P9 ********Y D@P3 ^ Y@P9 **D*****Y P@P4 ^ Y@P9 ***P****Y T@P2 ^ Y@P9 *T******Y ▸HLA-A*1101 Standard Sensitivity + 75%. Classifier Positive Rule Motif K@P9 ********K HLA-Fact K@P9 ********K DT_Rule L@P2 ^ K@P9 *L******K ANN_Rule V@P2 ^ K@P9 *V******K K@P9 ********K KBGA_Rule L@P2 ^ K@P9 *L******K V@P2 ^ K@P9 *V******K ▸HLA-A*2402 Standard Sensitivity+ 75%. Classifier Positive Rule Motif F@P2 *F******* HLA-Fact I@P9 ********I F@P9 ********F Y@P2 *Y******* Y@P2 ^ L@P9 *Y******L* Y@P2 ^ L@P9 *Y******L* DT_Rule Y@P2 ^ L@P9 *Y******L* ANN_Rule F@P2 *F******* KBGA_Rule I@P9 ********I F@P9 ********F Y@P2 *Y******* Y@P2 ^ L@P9 *Y******L* ▸HLA-A*0201 Standard Sensitivity+ 75%. Classifier Positive Rule Motif L@P2 *L******* HLA-Fact V@P9 ********V L@P2 ^ V@P9 *L******V L@P2 *L******* DT_Rule L@P2 ^ V@P9 *L******V ANN_Rule L@P2 *L******* KBGA_Rule V@P9 ********V L@P2 ^ V@P9 *L******V ▸HLA-A*0204 Standard Sensitivity+ 75%.
116
Positive_num 26 12 10 10 26 14 26 12 10 10
(28) (56) Negative_num Sensitivity (%) 1 0 0 0 1 1 1 0 0 0
96.30 100.00 100.00 100.00 96.30 93.33 96.30 100.00 100.00 100.00 (40) (80) Positive_num Negative_num Sensitivity (%) 29 4 87.88 29 4 87.88 7 1 87.50 9 0 100.00 29 4 87.88 7 1 87.50 9 0 100.00 (73) (146) Positive_num Negative_num Sensitivity (%) 16 5 76.19 20 5 80.00 14 3 82.35 46 5 90.20 28 2 93.33 28 2 93.33 28 2 93.33 16 5 76.19 20 5 80.00 14 3 82.35 46 5 90.20 28 2 93.33 (184) (368) Positive_num Negative_num Sensitivity (%) 123 42 75.00 63 19 76.83 43 1 97.73 123 42 75.00 43 1 97.73 123 42 75.00 63 19 76.83 43 1 97.73 (22) (44)
International Journal of Digital Content Technology and its Applications Volume 3, Number 2, June 2009 Classifier HLA-Fact DT_Rule ANN_Rule KBGA_Rule
Positive Rule Motif N/A N/A P@P5 ^ V@P9 ****P***V V@P9 ********V P@P5 ^ V@P9 ****P***V ▸HLA-A*3301 Standard Sensitivity+ 75%. Classifier Positive Rule Motif HLA-Fact N/A R@P9 *********R DT_Rule K@P9 *********K V@P2 ^ R@P9 *V******R ANN_Rule F@P3 ^ R@P9 **F*****R R@P9 ********R KBGA_Rule K@P9 ********K V@P2 ^ R@P9 *V******R F@P3 ^ R@P9 **F*****R L@P2 ^ K@P9 *L******K ▸HLA-A*0301 Standard Sensitivity+ 75%. Classifier Positive Rule Motif K@P9 ********K HLA-Fact L@P2 ^ K@P9 *L******K K@P9 ********K DT_Rule L@P2 ^ K@P9 *L******K ANN_Rule I@P2 ^ K@P9 *I******K K@P9 ********K KBGA_Rule L@P2 ^ K@P9 *L******K I@P2 ^ K@P9 *I******K E@P4 ^ K@P9 ***E****K ▸HLA-A31 Standard Sensitivity+ 75%. Classifier Positive Rule Motif HLA-Fact N/A R@P9 ********R DT_Rule K@P9 ********K V@P2 ^ R@P9 *V******R ANN_Rule F@P3 ^ R@P9 **F*****R R@P9 ********R KBGA_Rule K@P9 ********K V@P2 ^ R@P9 *V******R F@P3 ^ R@P9 **F*****R ▸HLA-A*6801 Standard Sensitivity+ 70%. Classifier Positive Rule Motif V@P2 *V******* HLA-Fact K@P9 ********K R@P9 ********R V@P2 ^ R@P9 *V******R V@P2 ^ R@P9 *V******R* DT_Rule V@P2 ^ R@P9 *V******R ANN_Rule V@P2 *V******* KBGA_Rule K@P9 ********K R@P9 ********R V@P2 ^ R@P9 *V******R
117
Positive_num
4 8 4 Positive_num 19 13 6 5 19 13 6 5 5 Positive_num 20 8 20 8 5 20 8 5 5 Positive_num 25 13 6 6 25 13 6 6 Positive_num 14 19 22 9 9 9 14 19 22 9
Negative_num
0 1 0 (32) Negative_num
Sensitivity (%)
100.00 88.89 100.00 (64) Sensitivity (%)
4 2 0 0 4 2 0 0 0
82.61 86.67 100.00 100.00 82.61 86.67 100.00 100.00 100.00 (38) (76) Negative_num Sensitivity (%) 1 95.24 0 100.00 1 95.24 0 100.00 0 100.00 1 95.24 0 100.00 0 100.00 0 100.00 (39) (78) Negative_num Sensitivity (%) 5 2 0 0 5 2 0 0
83.33 86.67 100.00 100.00 83.33 86.67 100.00 100.00 (42) (84) Negative_num Sensitivity (%) 6 70.00 8 70.37 9 70.97 0 100.00 0 100.00 0 100.00 6 70.00 8 70.37 9 70.97 0 100.00
Generating Rules for Predicting MHC Class I Binding Peptide using ANN and Knowledge-based GA Yeon-Jin Cho, Hyeoncheol Kim and Heung-bum OH ▸HLA-B8 Standard Sensitivity+ 75%. Classifier Positive Rule Motif HLA-Fact N/A DT_Rule N/A K@P3 ^ K@P5 **K*K**** ANN_Rule K@P3 ^ L@P9 **K*****L L@P9 ********L KBGA_Rule K@P5 ****K**** K@P3 **K****** K@P3 ^ K@P5 **K*K**** K@P3 ^ L@P9 **K*****L ▸HLA-B*2705 Standard Sensitivity+ 70%. Classifier Positive Rule Motif R@P2 *R******* HLA-Fact R@P2 *R******* DT_Rule R@P2 *R******* ANN_Rule R@P2 *R******* KBGA_Rule K@P9 ********K G@P1 G******** G@P1 ^ R@P2 GR******* R@P1 ^ R@P2 RR******* ▸HLA-B*3501 Standard Sensitivity + 75%. Classifier Positive Rule Motif P@P2 *P******* HLA-Fact P@P2 *P******* DT_Rule P@P2 ^ L@P9 *P******L ANN_Rule L@P9 ********L KBGA_Rule P@P2 *P******* P@P2 ^ L@P9 *P******L ▸HLA-B*2703 Standard Sensitivity + 75%. Classifier Positive Rule Motif R@P2 *R******* HLA-Fact R@P2 *R******* DT_Rule R@P2 *R******* ANN_Rule R@P2 *R******* KBGA_Rule R@P2 ^ R@P9 *R******R ▸HLA-B*5301 Standard Sensitivity + 75%. Classifier Positive Rule Motif P@P2 *P******* HLA-Fact P@P2 *P******* DT_Rule F@P1 ^ P@P2 FP******* ANN_Rule P@P2 *P******* KBGA_Rule P@P2 ^ I@P9 *P******I P@P2 ^ L@P9 *P******L P@P2 ^ F@P9 *P******F F@P1 ^ P@P2 FP******* ▸HLA-B*2706 Standard Sensitivity + 70%. Classifier Positive Rule Motif R@P2 *R******* HLA-Fact R@P2 *R******* DT_Rule R@P2 ^ G@P4 *R*G***** ANN_Rule R@P2 ^ S@P5 *R**S****
118
Positive_num
10 9 12 13 16 10 9 Positive_num 7 7 7 7 9 38 9 9 Positive_num 65 65 22 22 65 22 Positive_num 20 20 20 20 6 Positive_num 41 41 10 41 10 11 9 10 Positive_num 19 19 4 4
(26) (52) Negative_num Sensitivity (%)
0 0 4 4 4 0 0 (41) Negative_num 3 3 3 3 2 1 0 0 (67) Negative_num 6 6 0 3 6 0 (22) Negative_num 2 2 2 2 0 (41) Negative_num 4 4 0 4 1 0 0 0 (20) Negative_num 3 3 0 0
100.00 100.00 75.00 76.47 80.00 100.00 100.00 (82) Sensitivity (%) 97.44 97.44 97.44 97.44 70.00 81.82 100.00 100.00 (134) Sensitivity (%) 91.55 91.55 100.00 88.00 91.55 100.00 (44) Sensitivity (%) 90.91 90.91 90.91 90.91 100.00 (82) Sensitivity (%) 91.11 91.11 100.00 91.11 90.91 100.00 100.00 100.00 (40) Sensitivity (%) 86.36 86.36 100.00 100.00
International Journal of Digital Content Technology and its Applications Volume 3, Number 2, June 2009 R@P2 *R******* G@P1 G******** G@P1 ^ R@P2 GR******* R@P2 ^ G@P4 *R*G***** R@P2 ^ S@P5 *R**S**** R@P2 ^ G@P4 *R*G***** R@P2 ^ S@P5 *R**S**** ▸HLA-B*5102 Standard Sensitivity + 75%. Classifier Positive Rule Motif P@P2 *P******* HLA-Fact I@P9 ********I P@P2 ^ I@P9 *P******I P@P2 *P******* DT_Rule P@P2 ^ I@P9 *P******I ANN_Rule P@P2 *P******* KBGA_Rule I@P9 ********I P@P2 ^ I@P9 *P******I ▸HLA-B*0702 Standard Sensitivity + 75%. Classifier Positive Rule Motif P@P2 *P******* HLA-Fact P@P2 ^ L@9 *P******L P@P2 *P******* DT_Rule P@P2 ^ L@9 *P******L ANN_Rule P@P2 *P******* KBGA_Rule P@P2 ^ L@9 *P******L ▸HLA-B*5103 Standard Sensitivity + 75%. Classifier Positive Rule Motif I@P9 ********I HLA-Fact I@P9 ********I DT_Rule P@P2 ^ I@P9 *P******I ANN_Rule P@P2 *P******* KBGA_Rule I@P9 ********I P@P2 ^ I@P9 *P******I ▸HLA-B*5401 Standard Sensitivity + 75%. Classifier Positive Rule Motif P@P2 *P******* HLA-Fact P@P2 *P******* DT_Rule P@P2 *P******* ANN_Rule P@P2 *P******* KBGA_Rule F@P1 ^ P@P2 FP******* L@P1 ^ P@P2 LP******* P@P2 ^ I@P9 *P******I ▸HLA-B*5101 Standard Sensitivity + 75%. Classifier Positive Rule Motif I@P9 ********I HLA-Fact P@P2 *P******* DT_Rule P@P2 ^ I@P9 *P******I ANN_Rule P@P2 *P******* KBGA_Rule I@P9 ********I P@P2 ^ I@P9 *P******I KBGA_Rule
119
19 5 5 4 4 4 4 Positive_num 19 18 11 19 11 19 18 11 Positive_num 51 16 51 16 51 16 Positive_num 18 18 11 18 18 11 Positive_num 42 42 42 42 10 10 10 Positive_num 20 24 13 24 20 13
3 2 0 0 0 0 0 (29) Negative_num 3 2 0 3 0 3 2 0 (52) Negative_num 6 1 6 1 6 1 (29) Negative_num 2 2 0 6 2 0 (42) Negative_num 1 1 1 1 0 0 0 (35) Negative_num 2 3 0 3 2 0
86.36 71.43 100.00 100.00 100.00 100.00 100.00 (58) Sensitivity (%) 86.36 90.00 100.00 86.36 100.00 86.36 90.00 100.00 (104) Sensitivity (%) 89.47 94.12 89.47 94.12 89.47 94.12 (58) Sensitivity (%) 90.00 90.00 100.00 75.00 90.00 100.00 (84) Sensitivity (%) 97.67 97.67 97.67 97.67 100.00 100.00 100.00 (70) Sensitivity (%) 90.91 88.89 100.00 88.89 90.91 100.00