A Self-organizing Multi-agent System for Online Unsupervised Learning in Complex Dynamic Environments Igor Kiselev
Reda Alhajj
Department of Computer Science University of Calgary Calgary, AB, Canada, T2N 1N4
Department of Computer Science University of Calgary Calgary, AB, Canada, T2N 1N4
[email protected]
[email protected]
ABSTRACT The task of continuous online unsupervised learning of streaming data in complex dynamic environments under conditions of uncertainty is an NP-hard optimization problem for general metric spaces. This paper describes a computationally efficient adaptive multi-agent approach to continuous online clustering of streaming data, which is originally sensitive to environmental variations and provides a fast dynamic response with event-driven incremental improvement of optimization results, trading-off operating time and result quality. Experimental results demonstrate the strong performance of the implemented multi-agent learning system for continuous online optimization of both synthetic datasets and datasets from the RoboCup Soccer and Rescue domains.
Categories and Subject Descriptors I.2.11 [Artificial Intelligence]: Distributed Artificial Intelligence – Multiagent systems.
General Terms Algorithms, Design.
Keywords Online unsupervised learning, anytime coalition formation, market-based dynamic distributed resource allocation.
1. INTRODUCTION It is indispensable for intelligent systems acting in such complex environments as autonomous robotic systems, dynamic manufacturing and production processes, and distributed sensor ad-hoc networks to be capable for successfully responding to environmental dynamics and making time critical decisions online under conditions of uncertainty. Continuous decision-making and anytime optimization in changing environments under conditions of uncertainty and limited computational recourses represent one of the most challenging problems for developing robust intelligent agents. An autonomous system, acting in the complex uncertainty environments, may not be aware about the results of possible actions it can take and may have no information regarding what constitutes a desirable state or a correct action. To develop a probabilistic theory of the operating environment (a representation of the world), an intelligent system solves the problem of unsupervised learning, that is the process of discovering significant patterns or features in the input data when no certain output or response categories (classes) are specified. In order to be effective in solving time-critical problems in complex dynamic environments with higher levels of uncertainty,
an intelligent system must continuously adapt parameters of its learning system to variations in the incoming signals generated by the non-stationary environment in a real-time fashion. The classical dynamic approach to learning and optimization addresses the issue of statistical fluctuations of the incoming data by means of continual retraining of models that can be inappropriate in time-critical scenarios. We propose a different approach to continuous online learning, by viewing the task of unsupervised clustering as a dynamic distributed resource allocation problem that requires market-based negotiation between different self-interested agents in order to satisfy a distributed constraint network and achieve an implicit global solution to the problem. The proposed multi-agent approach to continuous online clustering of streaming data is different from conventional methods by being distributed and originally sensitive to environmental variations, rather than parallel. The distributed incremental learning algorithm ensures a quick response time to changes by directly adapting of only those areas of the global decision space that are affected by them. All decentralized decisions of the method are not fixed and locally reconsidered when needed in response to environmental events, taking advantage of domain semantics. Our two main contributions include a computationally efficient market-based algorithm of continuous agglomerative hierarchical clustering and a knowledge-based self-organizing multi-agent system for implementing it (demo is available online). The developed dynamic learning algorithm provides event-driven incremental improvement of clustering results trading-off operating time and result quality, and its multi-objective decisionmaking model enables the algorithm to operate on the basis of non-standard optimization criteria using unusual measures of similarity. The implemented multi-agent platform of the system provides a distributed computational environment (virtual marketplace) and a run-time support for asynchronous quasiparallel negotiations between the agents in a virtual marketplace over all the agents of the complex shared ontological structure (knowledge base). The application of the proposed solution to continuous online clustering is appropriate for various scenarios of time-critical data processing: online intrusion detection, emergency response in hazardous situations (e.g. forest fires), control of military operations with time critical targets, online learning a multi-agent behavior of distributed robotic systems (e.g. in the RoboCup Soccer and Rescue domains), and run-time detection of previously unknown dispatching rules and effective scheduling policies in transportation logistics.
2. PROBLEM DESCRIPTION AND RELATED WORK This paper addresses the issue of continuous online agglomerative hierarchical clustering of streaming data in complex dynamic environments under conditions of uncertainty. The problem of unsupervised learning is the process of discovering significant patterns or features in the input data when no certain output or response categories (classes) are specified. The existing task of unsupervised clustering in statistical learning requires the maximizing (or minimizing) of a certain similarity-based objective function defining an optimal segmentation of the input data set into clusters. There are two types of unsupervised clustering algorithms: partitional and hierarchical [1]. The goal of a partitional optimization algorithm can be defined by finding such assignments M ∗ of observations X to output subsets S that minimize a mathematical energy function, which characterizes the degree to which the clustering goal is not met and is the sum of the cost of the clusters: W ( M ∗ ) = ∑i =1 c( S i ) . The problem of partitional clustering is known to be computationally challenging (NP-hard) for general metric spaces and is computationally intractable for real-world problems of practical interest (Davidson and Ravi 2005). In comparison to partitional clustering algorithms, hierarchical clustering does not depend on the choice for the number of clusters to be searched and a starting configuration assignment, and is appealing for problems in which the input data set does not have a single “obvious” partition into well-separated clusters. The goal of a hierarchical optimization algorithm is to extract an optimal multi-level partitioning of data by producing a hierarchical tree T ( X ) in which the nodes k
represent subsets S i of X . In particular, the leaves of the tree comprise the individual elements of X , S itself constitutes the root of the tree (only one cluster at the highest level), and internal nodes are defined as the union of their children S i U S j . The time and space complexities of the hierarchical clustering are higher the partitional one: in standard cases a typical implementation of the hierarchical clustering algorithm requires O( N 2 log N ) computations. Conventional methods of data clustering are centralized, static, and batch-oriented. Classical approaches assume centralized access to a static set of input data available in a single location, and rely on fixed decision criteria for learning, specifications of constraints and common data structures (e.g. similarity matrices) invariable during algorithm execution. Clustering results of batchoriented methods are available only after their full completion, and must be started again from scratch in order to react on environmental variations. Nevertheless, the task of continuous online learning in complex dynamic environments assumes near real-time mining of streaming data continually arriving at the system, which imposes additional requirements for clustering optimization algorithms of being distributed, dynamic, and incremental. In order to be effective in solving time-critical learning in changing environments, the clustering algorithms must continuously adapt their parameters to environmental perturbations.
To address this issue of effective online learning various approaches have been considered in the literature (refer to online supplementary materials for a complete overview of related work). Decentralized clustering algorithms were proposed to speeds up centralized learning by dividing it onto a set of processors and allowing them to learn concurrently with an additional centralized algorithm of aggregating partial mining results to the global solution [2]. To handle the complexity of the problem, approximate clustering algorithms were developed to search for a feasible solution in incomplete decision space, by applying approximation heuristics that reduce the problem dimension, but lead to worse result quality [3]. Clustering methods for unsupervised learning of streaming data were developed, which support incremental update of the mining result by applying additional repairing methods [4]. Uncertainty of the operating environment is approached by feedback-directed clustering algorithms that apply reinforcement learning techniques to guide the search toward better cluster quality [5]. The distributed-constraint reasoning formalism was proposed to approach such problems as continual scheduling, routing, and resource allocation in dynamic environments by means of distributed problem solving algorithms, which are better suited to deal with changes in a localized fashion. To solve a task in a distributed manner it is formulated as a distributed constraint satisfaction or optimization problems. A distributed constraint optimization problem (DCOP) defines a set of variables, distributed among multiple agents, with a possible set of values, and a set of constraints that assign costs to combinations of value assignments [6]. The goal of an optimization algorithm is to find the best assignment of values to variables, such that the cost of a set of constraints over variables is either minimized or maximized. The task of Dynamic DCOP allows constraints to be added or removed from the problem as external environmental conditions change. Various efficient distributed algorithms have been developed for solving distributed constraint optimization problems (e.g. DPOP, OptAPO, ADOPT, and Iterative Distributed Breakout) by applying different techniques, such as utility propagation mechanisms for optimization based on dynamic programming, distributed approximation schemes for anytime optimization, and self-stabilizing propagation protocols for continuous-time optimization [7], [8]. While guaranteeing that the optimal solution is found, the algorithms introduce an exponential term in computational overheads (OptAPO) or communication overheads (DPOP in the message size and ADOPT in the message number). Consequently, such computationally expensive complete methods as DCSP and DCOP may not be suitable to provide a fast response to environmental variations, and such distributed incomplete local search methods as IDB can be too expensive in large-scale dynamic environments [9]. As opposed to previous work, we propose a different multi-agent approach to continuous online learning with a view to responding to rapid changes in the environments, by modeling the task of unsupervised clustering as a dynamic distributed resource allocation problem that requires market-based negotiation between different self-interested agents in order to satisfy a distributed constraint network and achieve an implicit global quasi-optimal solution to the optimization problem. We demonstrate that the task of continuous unsupervised learning, when formulated as a constraint satisfaction problem, when
formulated as a dynamic distributed resource allocation problem, can be effectively approached by adapting the market-based method for dynamic distributed resource allocation developed for solving the task of continuous scheduling in transportation logistics [10].
3. A MULTI-AGENT APPROACH TO ONLINE LEARNING The developed multi-agent approach to adaptive online unsupervised learning of streaming data is based on an asynchronous message-passing method of continuous agglomerative hierarchical clustering and a knowledge-based competitive multi-agent system for implementing it (refer to online supplementary materials for a software demonstration). The online learning algorithm implements the concept of clustering by asynchronous message-passing [11] whereby the solution to the constrained optimization problem of clustering is obtained by satisfying a dynamic distributed constraint network defined for data elements. A similarity-based objective function, which defines an optimal segmentation of the input data set into clusters, is implicitly maximized with all preferences of the network (the self-interested mining agents of records and clusters in the virtual clustering marketplace) are satisfied. The proposed computationally efficient multi-agent algorithm for online agglomerative hierarchical clustering is different from conventional unsupervised learning methods by being distributed, dynamic, and continuous. Distributed clustering process provides the ability to perform efficient run-time learning from both centralized and decentralized data sources (logically separated or physically distributed across a heterogeneous data environment) without an additional centralized algorithm of aggregating partial mining results. Both the input dataset of decentralized sources and decision criteria for learning (e.g. similarity matrices and expert knowledge) are not fixed and can be changed at run-time during execution of the dynamic algorithm. The continuous adaptive process of distributed learning is originally sensitive to environmental variations and provides a fast dynamic response to environmental variations with event-driven incremental improvement of learning results. Clustering results of the adaptive learning algorithm are available at any time and continuously improved to achieve a global quasi-optimal solution to the optimization problem, trading-off operating time and result quality. The data-driven self-organizing process of dynamic continuous optimization is based on the constant distributed search to maintain a dynamic balance among the interests of all competitive participants in the interaction by means of a decentralized marketbased method for multi-agent coordination.
3.1 A Market-based Algorithm of Continuous Agglomerative Hierarchical Clustering The implemented auction-based method of agent negotiations is based on a modified Contract-Net Protocol, where agents submit bids based on the cost of possible allocation variant. Each participant of the negotiation evaluates new allocation options and sends an approval only when the criteria value of a new agent state with a new established link and broken previous relationships with other agents (if any) is better than the value of the current agent state. Autonomous agents of records and clusters
continuously negotiate with each other in the virtual clustering marketplace to enhance their satisfaction levels with minimal costs and establish semantic links of the highest utility, according to the following ongoing algorithmic process. (1) Continuous arriving of data elements at the system. For each data or cluster element continuously arriving at the system at runtime, the Biosphere Agent of the special service type continuously parses a description of a clustering situation, constructs the initial ontological scene (instances and relationships) in the mining ontology, and produces a population of corresponding mining agents in the virtual clustering marketplace. (2) Locating candidate agents for allocation negotiations (semantic-based pre-matching). Once created, an agent examines the mining ontology to find appealing candidate agents to initiate allocation negotiations inside its limited field of vision, considering domain-dependent heuristics. The agent evaluates the value of the possible ontological relationship with the located candidate according to the agent criterion of the corresponding negotiation type. Additionally, the agent has an opportunity to evaluate this possible relationship from the partner’s side in order to send the partner a negotiation request only being consciously aware of the partner’s interest to participate in the negotiations. (3) Allocating a record to the existing cluster. To accomplish the allocation goal a record agent can participate in three types of allocation negotiation processes: it can interact with the existing cluster agents to join it, or negotiate with either other record agents or cluster agents to create a new cluster to be allocated to (algorithmic steps #3, #4, #5 respectively). (3.1) Requesting membership in a cluster by a record agent (task announcement). A record agent sends a membership request to selected cluster agents. An autonomous agent of a candidate cluster can proactively sends a membership proposal to an appealing single record agent without receiving a membership request (algorithmic step #7). (3.2) Proposing membership in a cluster by a cluster agent (bidding for allocation). A cluster agent evaluates the value of the possible allocation option (ontological relationship) with the partner according to its agent criterion of the corresponding negotiation type (“allocating a record to a cluster”). The cluster agent selects a single candidate record agent with the most profitable allocation variant that increases the value of its current agent state and sends to it a membership proposal. (3.3) Establishing a membership contract by a record agent (allocation bid processing). The record agent selects a single candidate cluster agent with the most profitable allocation variant that increases the value of its current agent state and sends to it a membership contract. (3.4) Performing allocation of a record to a cluster by a cluster agent (membership contract processing). Once a membership contract has been awarded to a cluster agent, the initiator and participant of the negotiation process commit to the target allocation goal to establish a mutually profitable ontological relationship. Applying an allocation plan leads to breaking previous ontological relationships (if any), creating new semantic instances and establishing new relationships with other agents, and producing a new mining agent. Subsequently, since all agents have a perceptual capability to observe changes in the semantic
network and receive notifications of the occurrence of events with their ontological objects, those agents that was affected by the allocation plan (whose ontological instances have been altered) update their internal knowledge representation of the mining ontology and compute criteria values of new agent states. (4) Creating a cluster by record agents; (5) Creating a cluster of the higher level by record and cluster agents. A record agent can consider the option of creating a cluster of a higher level to be allocated to with another record or cluster agent, in case there are either no cluster agents that can increase the value of the current state of a record agent or all of them refuse a membership request since such an allocation option is not profitable for them. At the final protocol step, a participant record agent performs a mutually profitable allocation plan by creating a cluster instance and establishing new ontological relationships with it. Once a new cluster instance is created a new cluster agent is produced in the virtual clustering marketplace and assigned to it. (6) Creating a cluster of the higher level by cluster agents. Besides negotiating with records agents to enhance its quality, a cluster agent has the second goal of establishing the most profitable allocation with the agents of clusters. To achieve this goal a cluster agent participates in a negotiation process of the synthesis type with other cluster agents to create a cluster of a higher level to be a part of it. (7) Agent proactive improvements of optimization results. The implemented agent architecture allows mining agents to balance reactive and proactive behaviors. The agents exhibit not only reactive behavior by simply responding to system events, but also proactively search for the most profitable allocation variants and initiate negotiations to establish and reconsider ontological relationships with other agents, thereby enhancing results of the dynamic clustering process. During the active state of agent proactive improvements agents can restrictively regulate a depth of their vision and consider allocation options with the increased length of the “ripple-effect”, which is a decision reconsideration chain that improves the overall clustering results (adaptation). Additionally, different centralized and distributed control mechanisms for self-organization are applied to control the transition of the dynamic optimization process toward the global quasi-optimal state while avoiding undesirable local attractors [9]. (8) Terminating algorithm execution. For situations when no changes of ontological relationships and cluster memberships can increase values of the agents, the system eventually settles down to a new quasi-optimal state and the distributed learning process turns to the inactive state. The algorithm is terminated either when a predefined amount of time elapses or it is known that the operating environment is static and no more input data elements will arrive at the system. The following figure (Figure 1) represents a conceptual example of the data-driven process of solving the dynamic constraint satisfaction problem in the local context of the agents affected by the external event. The first case of the figure demonstrates the initial configuration of cluster and record agents in the virtual clustering marketplace. An occurred external event (a new record # 4 appears in the system) causes the process of conflict-driven balancing of agent interests, which leads to a new configuration of the dynamic equilibrium of agent interests, represented by the last case. The case in the middle of the figure demonstrates a
distributed constraint network of the self-interested agents negotiating with each other in the virtual clustering marketplace. New event: record #4 arrived
Conflict-driven balancing of agent interests
Dynamic equilibrium of the constraint network
Figure 1. Dynamic constraint satisfaction in the local context
3.2 Agent Decision-making Model Goal-driven behavior of autonomous agents is supported by the developed multi-objective microeconomic decision-making model, which defines for each agent in the virtual marketplace ontology its individual goals, criteria, preference functions, and decision-making strategies. Personal agent accounts with violation functions of agent preferences reflect individual expenses and profits for all agents at any moment of distributed decision-making process. Competitive agents of records and clusters act in the virtual clustering marketplace to satisfy their individual goals and maximize their criteria values according to the chosen decision-making strategy.
3.2.1 Agent goals Semantic agents of records have the goal to establish the most profitable allocation with the agents of clusters according to their individual agent criteria (“to be allocated”). To accomplish the allocation goal a record agent can either send a membership application to the existing cluster to join it or be allocated to a new cluster, which can be created as a result of a negotiation process of the synthesis type with either another record agents or existing cluster agents. To support hierarchical clustering there are two different goals defined for a cluster agent within its decision-making model. The first goal of a cluster agent is to establish links with the agents of records to create the most profitable ontological cluster of the best quality (e.g. maximizing cluster density or overall similarity of its members), (“allocate”). The second goal of a cluster agent has the same notion as the goal of a record agent, to establish the most profitable allocation with the agents of clusters, and is defined through the task of establishing the most effective relationship of the “part-of” type with another cluster agent (“to be allocated”). A cluster of the higher level can be created as a result of a negotiation process of the synthesis type between different cluster agents. The hierarchical decomposition of the goals of the mining agents into planning tasks is represented using the notation of the Agent Modeling Language on the following diagram (Figure 2).
Chebychev similarity metric SC ( X ) is defined as average of the Chebychev similarity metrics for all dimensions of the dataset according to equations (4)–(7):
ClusterGoal
1 K ∑ SC j ( X ) K j =1 SC j ( X ) = 1 − cw j ( X ) cw( X ) SC ( X ) =
BeAllocated
Allocate
cw( X ) = MAX j (cw j ( X ))
cw j ( X ) = MAX j (max( X j ) − min( X j ) )
FormCluster
(4) (5) (6) (7)
, where SC j (X ) - the Chebychev similarity metric on attribute
(for a specific dimension), cw( X ) - the maximum distance JoinCluster
FormClusterWithRecord
AllocateRecord
FormClusterWithCluster
Figure 2. Hierarchical decomposition of agent goals into planning tasks
3.2.2 Agent decision-making strategies The developed multi-objective decision-making model makes it possible for the learning algorithm to operate on the basis of dynamic nonstandard optimization criteria and suitable for exploratory data analysis using unusual measures of similarity. Currently supported agent decision-making strategies are based on the following agent criteria: the Euclidian distance-based measure of similarity, the Chebychev similarity metric, and the angle metrics defining polarization (“shape”) of agent communities (multilevel and multicultural) in decision-space. Agent decision-making strategies can be applied dynamically at run-time to the whole agent society (global level), to a single agent (individual level), or to agent groups in different areas of the virtual clustering marketplace (several polarization vectors). The Euclidian distance-based measure S (xi , xi ' ) [1] is based on the definition of dissimilarity between the individual objects being clustered and is defined for two objects i and j according to equations (1)–(3):
S (xi , xi ' ) = 1 (1 + D( xi , xi ' ) )
(1)
D(xi , xi ' ) = ∑ w j d j (xij , xi ' j )
(2)
d j (xij , xi ' j ) = (xij − xi ' j )
(3)
p
j =1
2
between the points in any single dimension, cw j ( X ) - the
Chebychev width of a cluster on a single attribute, and K is the number of attributes for dataset values. Using the Chebychev similarity metric as a similarity measure has the advantage of being less computational expensive in comparison to the distancebased metric of similarity since there is no need for estimating parameters of all records allocated to the cluster during the negotiation process. The dynamic distributed learning process ensures determinate mining results, and is characterized by being non-determinate, asynchronous, and quasi-parallel. As opposed to conventional agglomerative hierarchical clustering techniques (e.g. the singlelinkage clustering algorithm), the standard order of building the clustering hierarchy, from the bottom to the top, is violated during the dynamic distributed learning process. To define such an agent preference for establishing ontological relationships between agents at the same hierarchical level, the agent similarity criteria include a penalty for establishing links between agents at different levels of the clustering hierarchy. In the developed agent decision-making model, a normalized preference function is defined for each agent criterion that allows an agent to simultaneously consider values of different criteria in the reasoning process. The different outcomes from applying certain agent decision-making strategies are shown on the following conceptual diagram (Figure 3). Event occurs: new record #1 arrived
Strategy: “minimizing distance”; record #1 → ”A”
Strategy: “keeping shape”; record #1 → ”B”
, where D(xi , xi ' ) - overall object dissimilarity defined as the
(
)
weighted average for all dimensions, and d j xij , xi ' j - distancebased dissimilarity defined as the Euclidean squared distance for a specific dimension. Utilizing the agent criterion, based on the Euclidian distancebased measure of similarity, the learning algorithm relies on the ability to calculate the centroid of each cluster S i . For the input spaces where it is not possible, we propose to use the Chebychev similarity metric as a basis for a similarity measure between objects. The Chebychev similarity measure is based on the definition of the Chebychev distance metric, where the distance between two vectors is the greatest of their differences along any coordinate dimension: cd (xi , xi ' ) = MAX j xij − xi ' j . The
(
)
Figure 3. Different agent decision-making strategies defines optimization criteria
Normalized similarity metrics for calculating agent criteria values are defined for each agent type and each agent goal, and are distinguished for each type of agent negotiations a mining agent participates in: for record agents (algorithmic steps #3, #4, #5), and for cluster agent (algorithmic steps #3, #5, #6). Additional equations are defined for calculating agent criteria values for the agent self-appraisal in the virtual clustering marketplace. Each equation above is defined for three types of similarity metrics (the Euclidian distance-based measure of similarity, Chebychev similarity metric, and the angle metrics). Thus, the complete implemented optimization model, representing a full set of agent criteria for performing agglomerative hierarchical clustering by satisfying the distributed constraint network, consists of 24 equations in total and is omitted in the paper due to space limitations (refer to online supplementary materials).
4. CONTROLLING THE SYSTEM The online optimization algorithm continuously processes data elements arriving at the system and can be stopped either by the operator request or by detecting the absence of agent activities in the system. In the latter case, the multi-agent system temporarily proceeds to the inactive sleeping state to release computational resources. Mining agents, having detected the changes in the virtual environment that increase their satisfaction level, transform the system back from the passive sleeping state to the active learning mode (Figure 4). Initial states
Working states
Figure 4. Operating modes of the multi-agent system A multi-agent system performing the distributed learning process is fully controllable by the user, who can manage its execution with the following control elements: run, step, and stop (Figure 5). Run
Step
Stop
Figure 5. Control elements for managing system execution Parameters of the optimization algorithm are dynamically configurable at run-time on the dashboard for algorithm properties. Thus, the pre-matching radiuses for the agents of records and clusters, a domain specific heuristics of the semanticbased pre-matching algorithm, which specifies a limited field of vision for mining agents to locate candidate agents for allocation negotiations, can be dynamically adjusted at any time (Figure 6).
The occurred external event lead to an effective system reorganization, during which the system transit from one dynamic state of balance into a new more effective one by the breaking of previously established links between record and cluster agents and the emergence of the local decision reconsideration chains (ripple-effect). The following system events are currently supported to simulate environmental variations and analyze subsequent event-driven incremental improvement of optimization results: “inject” to introduce a new record or a cluster into the system, “freeze” or “unfreeze” to disable or enable particular agents to participate in negotiations, and “perish” or “disband all” to destroy a particular or all agents in the virtual clustering marketplace. The multi-agent system responds to an external event by locally reorganizing those areas of the global decision space that are affected by the event. A length of the ripple-effect can be estimated by the number of new agents created as a result of the event. Thus, destroying a cluster agent on the top level of the clustering hierarchy leads to creating of only one new cluster agent instead, but destroying one on the lowest level of the hierarchy leads to the synthesis of thirteen cluster agents (Figure 7). A) Destroying a cluster agent on the top level: RE=1
B) Destroying a cluster agent on the lowest level: RE=13
Figure 7. Ripple effect as a result of destroying a cluster agent
5. EXPERIMENTAL EVALUATION The proposed multi-agent method of online optimization was tested for continuous agglomerative hierarchical clustering of both synthetic datasets and datasets from the following real-world domains: the RoboCup Soccer and Rescue competitions, and the bioinformatics domain (gene expression datasets). The experimental datasets for the RoboCup Soccer domain were obtain by analyzing files of final game (simulation league) at 2006 between the teams “Brainstormers” from the University of Osnabrueck (Germany) and “WrightEagle” from the University of Science and Technology of China. The datasets for the clustering optimization task were obtained using a data preparation framework, which parses log files of previous RoboCup Soccer games and generates knowledge representation structures of agent action scenes suitable for data mining purposes [12]. The datasets for the RoboCup Rescue domain were generated in the ARES (Agent Rescue Emergency Simulator) multi-agent simulation environment [13]. Conducted experiments demonstrated the strong performance of the developed multi-agent learning system for continuous online clustering of the RoboCup Soccer and Rescue datasets. The experiments report the solution quality under various time constrains, data arriving patterns, and different agent decisionmaking strategies. Further experiments will be conducted to investigate the performance of the algorithm working concurrently with distributed robotic systems at run-time.
Figure 6. Control elements for managing system execution
Learning results for clustering datasets from different experimental testbeds are presented on Figure 7 as a balloon diagram (multilevel and multicultural agent communities), on Figure 8 a radial diagram, and on Figure 9 a dendrogram.
Nevertheless, the preliminary experimental results revealed the current limitation of the algorithm to efficiently perform massive data processing of high-dimensional data (e.g. genome-wide gene expression data). Even our primary goal is to handle a dynamic environment.
Figure 7. A dataset from the RoboCup Rescue domain
Figure 8. A dataset from the RoboCup Soccer domain
6. CONCLUSION AND FUTURE DIRECTIONS In this paper, we have introduced an adaptive multi-agent approach to continuous agglomerative hierarchical clustering of streaming data, which is originally sensitive to environmental variations and provides a fast dynamic response with event-driven incremental improvement of optimization results, trading-off operating time and result quality. We described the developed multi-objective decision-making model enables the algorithm to operate on the basis of non-standard optimization criteria using unusual measures of similarity. The implemented multi-agent platform of the learning system provides a distributed computational environment and a run-time support for asynchronous quasi-parallel negotiations between the agents in a virtual marketplace. Conducted experiments demonstrated the strong performance of the developed multi-agent learning system for continuous online clustering of both synthetic datasets and datasets from the testbeds taken from the RoboCup Soccer and Rescue competitions. Nevertheless, the developed prototype system revealed the current limitation of the algorithm to efficiently perform massive data processing of high-dimensional data. To address this issue we have been working on the approach to reducing the problem dimension while maintaining the essential characteristics of the original system (the concept of renormalization) by modeling a multi-agent system in incomplete decision space as decentralized partially observable Markov decision processes (DEC-POMDPs) and applying Bayesian statistical methods. A subject of particular interest for us is developing a hybrid learning approach to be both intelligible and computationally efficient that combines dynamic multiagent unsupervised learning and model-based reinforcement learning of POMDPs. The desirable functionality of the learning approach is the bidirectional cyclic technique for adapting auction-based methods of agent negotiations based on feedback from a stochastic estimation model provided by a reinforcement learning framework. Such an approach to continual learning under conditions of uncertainty addresses a theoretical question of combining methods of two main schools, providing a different perspective on how to solve multiagent coordination problems in complex uncertain environments: distributed constraint reasoning techniques and decision theoretic approaches such as Markov decision processes [14]. Additionally, further research is being conducted on extending the adaptive learning approach to support online semi-supervised classification [15] by continuously deducing semantic-based classification rules from clustering results and performing automatic rule-based classification at run-time.
Figure 9. A gene dataset with 200 genes and 40 patients
Further computational experiments need to be conducted for the performance evaluation of different control methods and coordination mechanisms. We have been working on adapting benchmark datasets [16] to be able to compare the proposed solution with existing dynamic optimization algorithms in various settings. We consider the developed solution to be an important step in our research towards the development of effective online learning algorithms in dynamic uncertain environments, and plan to explore how to learn non-stationary dynamics.
7. REFERENCES [1] Jain, K., Murty, M. N., Flynn, P. J. Data clustering: a review. ACM Computing Surveys. 31(3):264–323, 1999. [2] Zhang, S.C., Wu, X.D., and Zhang, C.Q. Multi-database mining. IEEE Computational Intelligence Bulletin 2 (1), 5 – 13. IEEE Computer Society, 2003. [3] Davidson, I., Ravi, S.S. Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results. In Proceedings of the Ninth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2005), LNCS Vol. 3721, 59–70. Berlin, Germany: Springer Berlin-Heidelberg, 2005. [4] Aggarwal, C. C. eds. Data Streams: Models and Algorithms. New York, NY: Springer, 2006. [5] Bagherjeiran, A., Eick, C. F., Chen, C. S., Vilalta, R. Adaptive clustering: obtaining better clusters using feedback and past experience. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM-05), 565– 568. Washington, DC: IEEE Computer Society, 2005. [6] Modi, J., Jung, H., Tambe, M., Shen, W., Kulkarni, S. Dynamic Distributed Resource Allocation: A Distributed Constraint Satisfaction Approach. Revised Papers from the 8th International Workshop on Intelligent Agents VIII, LNCS Vol. 2333, 264–276. London, UK: Springer-Verlag., 2002. [7] A. Kumar, A. Petcu and B. Faltings. H-DPOP: Using Hard Constraints to Prune the Search Space. In Proceedings of the Eighth International Workshop on Distributed Constraint Reasoning (DCR–07), in conjunction with IJCAI'07, Hyderabad, India, 2007. [8] P.J. Modi, W. Shen, M. Tambe, M. Yokoo. ADOPT: Asynchronous distributed constraint optimization with quality guarantees. Artificial Intelligence Journal (AIJ). 161:149–180, 2005.
[9] Theocharopoulou, C., Partsakoulakis, I., Vouros, G. A., Stergiou, K.: Overlay networks for task allocation and coordination in dynamic large-scale networks of cooperative agents. In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-07), 295–302. New York, NY: ACM Press, 2007. [10] Kiselev, I., Glaschenko, A., Chevelev, A., Skobelev, P. Towards an adaptive approach for distributed resource allocation in a multi-agent system for solving dynamic vehicle routing problems. In Proceedings of the TwentySecond AAAI Conference on Artificial Intelligence (AAAI07), 1874–1875. Menlo Park, CA: AAAI Press, 2007. [11] Frey, B. J., Dueck, D. Clustering by Passing Messages Between Data Points. Science 315: 972–976, 2007. [12] Kiselev I.P., Alhajj R.S., Leung H. Learning a multi-agent behavior of a distributed autonomous robotic system. In Proceedings of “Scientific Session MEPhI-2008”. Moscow, Russia: MEPhI Press, 2008. [13] Denzinger, J., Kidney, J. ARES 2: A Tool for Evaluating Cooperative and Competitive Multi-agent Systems. In Proceedings of the Eighteenth Conference of the Canadian Society for Computational Studies of Intelligence. ), LNCS Vol. 3501, 38–42, 2005. [14] Ranjit Nair, Pradeep Varakantham, Milind Tambe and Makoto Yokoo. Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), 2005. [15] Basu, S., Bilenko, M., Banerjee, A., Mooney, R. J. Probabilistic Semi-supervised Clustering with Constraints. In Semi-Supervised Learning, eds. Chapelle, O., Schoelkopf, B., Zien, A., 73–102. Cambridge, MA: MIT Press, 2006. [16] Dataset repository for distributed constraint optimization problems. 2008. The University of Southern California. Available at: http://teamcore.usc.edu/dcop/