Multiagent Based Bio-data Mining Pengyi Yang1 , Li Tao2 , Liang Xu2 and Zili Zhang2, 1
3
Advanced Networks Research Group, School of Information Technologies (J12) The University of Sydney, NSW 2006, Australia 2 Faculty of Computer and Information Science, Southwest University Chongqing 400715, China 3 School of Engineering and Information Technology Deakin University Geelong, Victoria 3217, Australia
[email protected]
Abstract. This paper argues for applying multiagent based data mining technologies to biological data analysis. The rationale is described from multiple perspectives with an emphasize on biological context. Followed by that, an initial multiagent based bio-data mining framework is conceived, and a prototype system is developed to demonstrate how it helps the biologists who are often unfamiliar with data mining technologies to perform a comprehensive mining task for answering biological questions. The system offers a new way to reuse biological datasets and available data mining algorithms at their fullest.
1
Introduction
The unprecedentedly fast development of molecular biology is driven by modern high-throughput data generating technologies. The massive amount of data accumulated from the last two decades covers a full spectrum of various biological aspects and promised to promote our view to a higher level–system biology [1]. Yet, such vast collections of data are not in themselves meaningful. To extract useful biological information and knowledge from the raw data, various data mining strategies and their hybrids have been applied [2–5]. Owing to the high expenses, labor force, and most importantly the nature of different level of analyses (genome, transcriptome, and proteome, etc.), various data generating protocols (sequencing, genotyping, microarray, serial analysis of gene expression or SAGE, and mass spectrometry or MS, etc.), biological data are largely distributed in different databases around the globe with heterogeneous characteristics and formats etc. [6]. However, the available data mining strategies and their hybrids are often determined by the problem formulation and require careful preparation and editing before applying to a specific problem. This exerts a heavy load on researchers who want to find the answers for general biological problems. Additionally, this poses a situation that making the reuse of an once developed data mining program very difficult and time consuming. Sometimes, many researches prefer to write their own code from scratch instead of reusing currently available programs. Such problems are even more phenomenon when the user is unfamiliar with the technical details of data mining.
2
P. Yang et al.
To address the difficulties of reusability and make the bio-data mining an easy access practice for biological researchers who are unfamiliar with any specific data mining algorithm, we propose an agent driven data mining framework for biological data analysis–multiagent based bio-data mining. This system hide the data mining details from the users and attempts to provide as many available results as possible for a given enquiry. It helps the biological researchers to view the enquire problems from a higher level by combining multiple levels/sources of results and makes the data mining easy to be applied by non-expert. The paper is organized as follows: Section 2 argues for applying such an agent driven data mining framework in biological data analysis context. Section 3 provides an overview of the proposed framework. Section 4 details the experimental design and provides some preliminary results, while Section 5 concludes the paper.
2
Why Multiagent Based Bio-data Mining?
In this section, we present the rationale of introducing multiagent based bio-data mining for biological data analysis from different perspectives. 2.1
Hidden Technical Details From Layman
The ultimate goal of bio-data mining is to provide meaningful biological information and knowledge for better understanding of the organism been investigated. Therefore, the target user of the mining programs and algorithms should be biologists. However, since the bio-data mining in its nature is a data-driven process, most bio-data mining programs assume that the user have at least moderate knowledge of data preparation and data mining, and require he/she to select an appropriate algorithms from a large number of candidates for a specific biological problem. Unfortunately, such requirements are unreasonable for most biologists. Agent-based bio-data mining leaves the technical details of choosing mining algorithms, forming hybrid system, and preparing specific data format to the intelligent system itself. Thus, it helps to alleviate the technical difficulty, to make the full use of the available data, and to enhance the reusability of the bio-data mining algorithms. 2.2
Data Format
One major difficulty of reuse an once developed program to bio-data mining is that many biological datasets are generated and stored in different data format. Take microarray data for example. While many mining algorithms require the genes and samples/conditions to be represented as a data matrix, many microarray data are actually stored as a gene vector per sample/condition in multiple files. Such situation is probably due to the fact that different laboratories often use different protocols and different technologies for data generation and acquisition. Moreover, the ever-changing technologies make the standardization of the
Multiagent Based Bio-data Mining
3
bio-data even harder. As one may expected, this caused a similar effect on program and algorithm development. That is different mining programs often make different requirement on data format. For example, programs developed by Ding and Peng (mRMR) [7], Eisen et al. (Cluster) [8], and Li et al. (GA/KNN) [9] for microarray data analysis each makes a slightly different requirements on the data format. However, even such a slight difference in format requirement may force the analyzer to go carefully through the data format manual many times for data preparation, otherwise the program will produce erroneous results or simply won’t work. By applying agent based data mining framework, we can leave the data format details to agents who actually carry out the dirty work, and the reusability of both data and algorithms can be greatly improved. 2.3
Parallel Analyzing
Multiagent system (MAS) is a powerful technology for dealing with system complexities and for tackling open complex problems [10]. It provides an architecture for distributed computing [11], and is primarily designed to solving computationally intensive problems by delegating the task of exploring different portions of the computation and data analysis to different agents, with the objective of expediting the search [12]. We believe that such an architecture is extremely suitable for bio-data analysis because data mining is often a computational intensive and time consuming procedure. By applying multiagent based distributed bio-data mining, the computing load can be balanced and the computational effort can be achieved in a parallel manner. Such a framework not only can speed up the overall mining process but also can bringing multiple sources of information for answering a given biological problem. We call it bio-information fusion. 2.4
Agent-Based Hybrid Construction
Many data mining algorithms have been successfully applied to bioinformatics. Some examples are genetic algorithm (GA) [13], neural networks, and support vector machine (SVM) [14]. However, recent development indicates that in many cases one technique will not be sufficient to solve a problem, often due to the nature of the problem or because no one algorithm fits the problem requirements. With such observation, we witnessed the boom of various hybrid systems in last few years [2, 3, 15–17]. Yet, there are numerous ways in which algorithms can be combined. In our previous studies, we demonstrated that a specially designed agent-based framework can be utilized to create efficient hybrid system in a short time period [18, 19]. By applying such agent-based hybrid creating system, any mining algorithms can be added to the system dynamically at run-time, and the flexibility and robustness of the system are greatly improved. 2.5
Mining Multiple Levels of Data
One unique feature of biological data is that they ranging from the very basic DNA sequence information to 3-Dimensional protein structure information. As
4
P. Yang et al.
indicated in Figure 1, we divide them broadly into three major groups, namely, genomic data, transcriptomic data, and proteomic data, in accordance to nucleotide, gene expression, and protein analyses. Traditionally, certain biological enquiry is performed by applying certain data mining algorithms to a specific biological data type. However, biological systems are by their nature associated with each other. In order to obtain an in-depth understanding of the underlying mechanisms, mining multiple levels of data may offer us a more holistic picture. Multiagent bio-data mining framework offers us an efficient way to organize and mine multiple levels of bio-data at ease.
DNA sequence
SNPs
Genomic Level
Microarray
RNA blotting
Protein 3-D structure
MS
SAGE
Transcriptomic Level
Protein sequence
Proteomic Level
Fig. 1. Biological Data. Biological data can be divided into three levels. Each block indicates a type of data generated by a specific technology in a given level.
2.6
Mining Same Level of Data Generated by Different Technologies
Within a level, we may have different types of data generated by different technologies (Figure 1). Take the transcriptomic level as an example, two types of gene expression profiling technologies are widely used. They are serial analysis of gene expression (SAGE) [20] and DNA microarray [21]. While SAGE data consists of a list of thousands of sequence tag and the number of times each tag is observed in a particular tissue from different samples or conditions, microarray present the gene expressions with hybridizing abundance from different samples or conditions. Multiagent based bio-data mining framework offers us the capability to mine and combine the results from different types of data simultaneously. In this way, multiple outcomes can be used from mining results validation and confirmation. 2.7
Mining Same Data From Multiple Sources
In many cases, after generating, one biological dataset may be pre-processed with different criteria and stored in different databases with different formats. This may owing to the fact that the utilization of pre-processing or pre-filtering procedures are very common and are themselves a very active research direction [17].
Multiagent Based Bio-data Mining
5
Unfortunately, it is also very hard to tell which pre-processing or pre-filtering process will produce the “best” dataset for follow up mining procedures. When applying the same data mining algorithm, dataset pre-processed or pre-filtered with different steps and in different formats may gives quite different mining outcomes. This leads to the inconsistency of the results and makes it very difficult for biologists to choose which results to rely on. To enhance the reliability and efficiency, one can employ different mining algorithms to mine the different versions of a same dataset, and assess the mining results collaboratively. This will gives a less bias and more objective analysis, and help the biologists to discriminate genuine factors associated with biological phenomenon of interest.
3
Bio-data Mining of Human Diseases: A Case Study
3.1
An Initial Framework
The overview of the proposed framework is visualized in Figure 2. From the top, multiple users are connected with the mining task planning agent and aggregation agent through the interface agents. At the bottom, various bio-datasets are hosted by different databases. Each database is managed by a mining agent which
End users
Interface Agent
Interface Agent
Interface Agent
Interface Agent
ACL Yellow Page
Aggregation Agent
Planning Agent
Bioinformatics Ontology
ACL
Agent based databases Mining Agent
Mining Agent
Mining Agent
Mining Agent
Available Data Mining Algorithms
Available Data Mining Algorithms
Available Data Mining Algorithms
Available Data Mining Algorithms
Microarray Data SAGE Data
MS Data
Sequence Data SNP Data
MS Data Microarray Data
Fig. 2. Overview of the initial multiagent based bio-data mining framework.
6
P. Yang et al.
has the access to various mining algorithms. Each mining agent is connected with the planning agent for receiving mining tasks and with the aggregation agent for returning mining results. A mining task or multiple mining tasks of an end user is firstly collected by the interface agent and send to the planning agent. After receiving a given mining task or tasks, the planning agent will produce a mining plan based on the requirements of the mining tasks and the current mining load of each mining agent. Then it will delegate the mining task to different mining agents. After receiving the mining task, the mining agent will match the input enquires with the datasets keywords based on the bioinformatics ontology [22]. Once it matched a dataset, the mining agent will apply or form a well suited mining algorithm or hybrid for bio-data mining based on its knowledge about the characteristics of the available datasets and the available mining algorithms. When all mining processes are finished, the aggregation agent attempts to aggregate the results and present it to the end user.
3.2
System Analysis and Design
System analysis and design are the most important steps toward agent based system implementation [23]. Here we adopted one of the most popular agentoriented analysis and design measure–Gain methodology [24] for developing the system structure and for roles and interactions modeling. Currently, the initial system have following roles: Interface Agent (UserHandler), Planning Agent (MiningPlanner), Mining Agent (DataMiner), and Aggregation Agent (Aggregator). By using Gaia methodology, two of the most important roles (DataMiner and MiningPlanner) are formally represented as shown in Table 1 and Table 2. The interaction protocols associated with the roles DataMiner and MiningPlanner are detailed in Figure 3 and Figure 4, respectively.
ReceiveTask DataMiner
MiningPlanner
SendMinedResults Supplied tasks
Receive the mining task From the MiningPlanner
DataMiner
Aggregator
result
Send the mining results to Aggregator task requirements
aggregated result
Fig. 3. Definition of protocols associated with DataMiner.
Multiagent Based Bio-data Mining
7
Table 1. Mining agent role schema. Role Schema: DataMiner Description: Selects significant biomarkers and classifies samples by conditions etc. Protocols and Activities: ReceiveTask, AccessData, MatchData ,SelectAlgorithm, FormHybrid, MiningData, SendMinedResults Permissions: accesses Datasets // Datasets to be mined Algorithms // bio-data mining algorithms available to the mining agent generates Hybrids // hybrid system for mining certain bio-data MinedResults // Mining the datasets and generate results Responsibilities Liveness: DataMiner= ReceiveTask.AccessData.MatchData+. SelectAlgorithm.[FormHybrid]+.MiningData.SendMinedResults Safety: · Data sources are available · Data mining algorithms are available Table 2. Planning agent role schema. Role Schema: MiningPlanner Description: Generates plans for bio-data mining based on the observation of current work load of each mining agent and the requirements of the tasks. Protocols and Activities: GetMiningTasks, GetCapabilities, GetCapacities, DelegateTasks ,ProducePlan Permissions: reads MiningTask // a mining task or requirement provided by end user(s) Capabilities // capabilities of each mining agent Capacities // capacities of each mining agent generates MiningPlan // work plan for bio-data mining Responsibilities Liveness: MiningPlanner= GetMiningTasks.GetCapabilities.GetCapacities+. ProducePlan.DelegateTasks Safety: · MiningTask is clear and available
4 4.1
Experimental Design Datasets
Table 3 summarizes the datasets used in system demonstration. The “Ontology Keywords” column provides the keywords used for enquires matching. The on-
8
P. Yang et al.
GetMiningTasks MiningPlanner UserHandler
CetCapabilities Supplied tasks
Ask the UserHandler to provide the mining task
MiningPlanner
DataMiner
capabilities
mining tasks
GetCapacities MiningPlanner
DataMiner
task requirements
Get the DataMiner capabilities
DelegateTasks task requirements
Get the current capacities of the DataMiner
MiningPlanner
DataMiner
mining tasks
Delegate the mining task to the mining agent current capacities
Fig. 4. Definition of protocols associated with MiningPlanner.
tology is constructed by integrating GHDO (Generic Human Disease Ontology) [25] and TaO (TAMBIS Ontology) [22]. The “Features”, “Samples”, “Class”, and “Format” columns are used by mining agent as the data characteristics. As can be seem, many diseases have been studied from multiple aspects using different analyzing technologies and the data are in various formats, which is suitable for testing the proposed multiagent based bio-data mining system.
Table 3. Datasets descriptions. Dataset AMD [26] Leukemia1 [27] Leukemia2 [27] MLL [28]
Features Samples Class Format 116,204 146 2 Matrixa 7,129 3,571 12,582
72 72 72
2 2 3
Arffb Matrix Matrix
Ontology Keywords SNP; Complex Disease; Age-related Macular Degeneration Microarray; Leukemia Microarray; Leukemia Microarray; Leukemia; Subtypes SAGE; Breast; Cancer Microarray; Breast; Cancer MS; Prostate; Cancer Microarray; Prostate; Cancer column and feature id in the first
Breast1 [29] 305 15 2 Matrix Breast2 [30] 24,481 97 2 C4.5c Prostate1 [31] 15,154 322 4 Arff Prostate2 [32] 12,600 136 2 C4.5 a A sample matrix format with sample id in the first row. b A data format standard of Weka data mining package. c A C4.5 data format standard with feature ids and values are stored in two separate files.
Multiagent Based Bio-data Mining
4.2
9
Deployment and Implementation
We houses the above datasets in three different computers connected by intranet as follows: Computer1 192.168.208.110 System Fedora 5 Datasets AMD Leukemia1
Computer2 192.168.208.111 System Fedora 5 Datasets Breast1 MLL Prostate2
Computer3 192.168.208.112 System XP Professional Datasets Breast2 Prostate1 Leukemia2
The communication are made by following the FIPA ACL message structure specification [33] and the databases are agentified by adding transducers (mining agents) on top of the systems for requests translation and mining algorithm invocation [34]. Another server is used as the planning agent and aggregation agent for generating work plans and delegating mining tasks. As to forming hybrid system, we devise following guidelines: IF the number of features is large and the work load is high, THEN using available filtering algorithms for fast feature reduction. IF the number of features is small and the work load is low, THEN applying available wrapper algorithms for accurate feature selection. IF the number of features and samples are large, and the class of the dataset is multiple, THEN using an computational efficient classifier (for example, C4.5 or kNN) or ensemble of classifiers for sample classification. IF the number of features and samples are small, and the class of the dataset is binary, THEN applying an computational intensive but accurate classifier (for example, SVM or MLP) or ensemble of classifiers for sample classification. IF the correlations of the features are high, THEN utilizing clustering algorithms for correlation reduction [2]. Notice that the words in italic are ambiguous and can be considered as the environment factors. A given factor may be considered to be true in certain circumstances and may be considered to be false in others, depending on the state of the agents. 4.3
Results
A prototype system is developed based on the proposed framework to illustrate how it works. Figure 5 shows the user interface. We present the experimental results with three different mining inputs: “Complex Disease”, “Leukemia”, “Cancer”. Tables 4-6 provide the detailed results of each input. In Table 4, the input enquiry “Complex Disease” matches the dataset “AMD” in agent database system 192.168.208.110. The selected SNP markers/factors (rs1027438, rs380390, rs10490924, rs1420150, Age) are presented with the sample classification accuracy (68.5%). In Table 5, the input enquiry “Leukemia” matches three datasets from multiple databases (192.168.208.110; 192.168.208.111;
10
P. Yang et al.
Fig. 5. User interface of the multiagent based bio-data mining system. Table 4. Mining results which input “Complex disease”. “Complex Disease” Dataset: AMD Data Type (Level): SNP (Genomics) Selected BioMarkers (N=5): rs1027438, rs380390, rs10490924, rs1420150, Age Classification Accuracy: 68.5% Comments: AMD results provided by agent: 192.168.208.110 Input: Results:
192.168.208.112). The first two, namely, Leukemia1 and Leukemia2 are the same dataset which has been pre-processed with different pre-filtering procedures [35] and in different data formats. The third one is generated by another leukemia study [28]. The mining results not only provide the selected genes and sample classification accuracy of each dataset, but also provide the overlapped genes in different mining results. As to input enquiry “Cancer”, four datasets matches it and the system provides the mining results of each dataset. Notice that for breast cancer datasets, the results include those generated from SAGE study [29] and from microarray study [30]. For prostate cancer datasets, the results include those generated from MS study [31] and from microarray study [32]. These results give a multi-level view of the enquired biological problems. With more datasets and systems from different biological studies and experiments been integrated, this framework should be able to provide a more holistic picture for analyzer to view a given biological problem from multiple aspects.
Multiagent Based Bio-data Mining
11
Table 5. Mining results which input “Leukemia”. “Leukemia” Dataset: Leukemia1 Data Type (Level): Microarray (Transcriptomic) Selected BioMarkers (N=5): X95735, L09209, M84526, M27891, U50136 rna1 Classification Accuracy: 94.22% Overlap: (With dataset: Leukemia2 ) X95735, L09209, M27891 Dataset: Leukemia2 Data Type (Level): Microarray (Transcriptomic) Selected BioMarkers (N=5): M27891, U46499, L09209, X95735, M12959 Classification Accuracy: 96.05% Overlap: (With dataset: Leukemia1 ) M27891, L09209, X95735 Dataset: MLL Data Type (Level): Microarray (Transcriptomic) Selected BioMarkers (N=5): 34168 at, 36122 at, 1096 g at, 1389 at, 266 s at Classification Accuracy: 92.14% Comments: Leukemia1 results provided by agent: 192.168.208.110 Leukemia2 results provided by agent: 192.168.208.112 MLL results provided by agent: 192.168.208.111 Input: Results:
5
Discussion and Conclusion
As pointed out by Wren and Bateman, with the burst of biological data in last two decades, the number of Internet accessible databases has been rapidly growing on an annual basis [36]. Yet, The effort of use and maintain these large number of databases has also becoming overwhelmingly expensive to afford. While some databases are never used many others are not updated regularly (or at all) [37]. Over time many of them simply become inaccessible. The multiagent based bio-data mining framework offers a potential way to automate the use and maintain processes of the databases. In conclusion, we argued for applying multiagent based data mining framework to biological data analysis. The argument has been supported from multiple perspectives by briefly viewing the advantages of applying such a framework in biological data analysis context. We believe multiagent based bio-data mining framework will help to bridge the knowledge gap between data mining community and biology community, and enhance the reusability of biological databases as well as data mining algorithms.
References 1. Westerhoff, H.V. and Palsson, B.O.: The evaluation of molecular biology into systems biology. Nature Biotechnology 22(10), 1249-1252 (2004)
12
P. Yang et al. Table 6. Mining results which input “Cancer”. “Cancer” Dataset: Breast1 Data Type (Level): SAGE (Transcriptomic) Selected BioMarkers (N=5): CCTTCGAGAT, TTTCAGAGAG, TATCCCAGAA, CTAAGACTTC, TTGGAGATCT Classification Accuracy: 98.88% Dataset: Breast2 Data Type (Level): Microarray (Transcriptomic) Selected BioMarkers (N=5): NM 003258, AL137514, NM 003079, Contig 15031 RC, AL080059 Classification Accuracy: 73.39% Dataset: Prostate1 Data Type (Level): MS (Proteomic) Selected BioMarkers (N=5): 0.054651894, 125.2173, 271.33373, 478.95419, 362.11416 Classification Accuracy: 88.31% Dataset: Prostate2 Data Type (Level): Microarray (Transcriptomic) Selected BioMarkers (N=5): HPN, TSPAN7, GUSB, ALDH1A3, HEPH Classification Accuracy: 92.55% Comments: Breast1 , Prostate2 results provided by agent: 192.168.208.111 Breast2 , Prostate1 results provided by agent: 192.168.208.112 Input: Results:
2. Yang, P.Y. and Zhang, Z.L.: A clustering based hybrid system for mass spectrometry data analysis. In: Proceedings of Pattern recognition in Bioinformatics 2008 (PRIB 2008), LNBI 5265, 98-109 (2008). 3. Gong, B., Guo, Z., Li, J., Zhu, G., Lv, S., Rao, S. and Li, X.: Application of a genetic algorithm–support vector machine hybrid for prediction of clinical phenotypes based on genome-wide SNP profiles of sib pairs. In: Proceedings of FSKD 2005, LNAI 3614, 830-835 (2005). 4. Wang, H.: Clustering-based approches to SAGE data mining. BioData Mining, 1:5 (2008) 5. Frank, E.: Data mining in bioinformatics using Weka. Bioinformatics 20(15), 24792481 (2004) 6. Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A. and Tarczy-Hornoch, P.: Data integration and genomic medicine. Journal of Biomedical Informatics 40, 5-16 (2007) 7. Ding, C. and Peng, H.: Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 3(2), 185-205 (2005) 8. Eisen, M.B., Spellman, P.T., Brown, P.O. and Botstein, D.: Cluster analysis and display of genome-wide expression patters. PNAS 95(25), 14863-14868 (1998) 9. Li, L., Weinberg, C.R., Darden, T.A. and Pedersen, L.G.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17, 1131-1142 (2001)
Multiagent Based Bio-data Mining
13
10. Cao, L., Luo, C. and Zhang, C.: Agent-mining interaction: An Emerging Area. In: Proceedings of AIS-ADM 2007, 60-73 (2007) 11. da Silva, J.C., Giannella, C., Bhargava, R., Kargupta, H. and Klusch, M.: Distributed data mining and agents. Engineering Applications of Artificial Intelligence 18, 791-807 (2005) 12. Parunak, V.: Go to the ant: Engineering principles from natural agent systems. Ann. Oper. Res. 75, 69-101 (1997) 13. Ooi, C.H. and Tan, P.: Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19, 37-44 (2003) 14. Ding, C.H. and Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 7(4), 349-358 (2001) 15. Armano, G., Mancosu, G., Milanesi, L., Orro, A., Saba, M. and Vargiu, E.: A hybrid genetic-neural system for predicting protein secondary structure. BMC Bioinformatics 6(suppl 4):S3, (2005) 16. Keedwell, E. and Narayanan, A.: Discovering gene networks with a neural-genetic hybrid. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(3), 231-242 (2005) 17. Wang, Y., Makedon, F., Ford, J. and Pearlman, J.: HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21, 1530-1537 (2005) 18. Zhang, Z. and Zhang, C.: Building agent-based hybrid intelligent systems: A case study. Web Intelligence and Anget Systems 5(3), 255-271 (2007) 19. Zhang, Z. and Yang, P.: An agent-based hybrid system for microarray data analysis. submitted to IEEE Intelligent Systems 20. Velculescu, V.E., Zhang, L., Vogelstein, B. Kinzler, K.W.: Serial analysis of gene expressions. Science 270, 484-487 (1995) 21. Schena, M., Shalon, D., Davis, R.W. and Brown, P.O.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235), 467-470 (1995) 22. Baker, P.G., Goble, C.A., Bechhofer, S., Paton, N.W., Stevens, R. and Brass, A.: An ontology for bioinformatics applications. Bioinformatics 15(6), 510-520 (1999) 23. Zambonelli, F., Jennings, N.R. and Wolldridge, M.: Developing multiagent systems: the gaia methodology. ACM Transactions on Software Engineering and Methodology 12(3), 317-370 (2003) 24. Wooldridge, M., Jennings, N.R. and Kinny, D.: The Gaia methodology for agentoriented analysis and design. Autonomous Agents and Multi-Agent Systems 3, 285-312 (2000) 25. Hadzic, M. and Chang, E.: Ontology-based multi-agent systems support human disease study and control. In: Proceedings of the SOAS 2005, 129-141 (2005) 26. Klein, R.J., Zeiss, C., Chew, E.Y., Tsai, J.Y., Sackler,R.S., Haynes, C., Henning, A.K., SanGiovanni, J.P., Mane, S.M., Mayne, S.T., Bracken, M.B., Ferris, F.L., Ott, J., Barnstable, C. and Hoh, J.: Complement factor H polymorphism in agerelated macular degeneration. Science 308, 385-389 (2005). 27. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Boomfield, C.D. and Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537 (1999) 28. Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R. and Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30, 41-47 (2001)
14
P. Yang et al.
29. Lash, A.E., Tolstoshev, C.M., Wagner, L., Schuler, G.D., Strausberg, R.L., Riggins, G.J. and Altschul, S.F.: SAGEmap: A public gene expression resource. Genome Research 10, 1051-1060 (2000) 30. van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R. and Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530-536 (2002) 31. Petricoin, E.F., Ornstein, D.K., Paweletz, C.P., Ardekani, A.M., Hackett, P.S., Hitt, B.A., Velassco, A., Trucco, C., Wiegand, L., Wood, K., Simone, C.B., Levine, P.J., Linehan, W.M., Emmert-Buck, M.R., Steinberg, S.M., Kohn, E.C. and Liotta, L.A.: Serum Proteomic Patterns for Detection of Prostate Cancer. Journal of the National Cancer Institute 94, 1576-1578 (2002) 32. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico A.D., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R. and Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203-209 (2002) 33. FIPA ACL Message Message Structure Specification 2002. Available from: www.fipa.org/specs/fipa00061. 34. Karasavvas, K.A., Baldock, R. and Burger, A.: Bioinformatics integration and agent technology. Journal of Biomedical Informatics 37, 205-219 (2004) 35. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77-87 (2002) 36. Wren, J.D. and Bateman, A.: Databases, data tombs and dust in the wind. Bioinformatics 24(19), 2127-2128 (2008) 37. Galperin, M.: The molecular biology database collection. Nucleic Acids Res., 34(Database issue), D3-D5 (2006)