A Platform for Large-Scale Game-Playing Multi-Agent ...

A Platform for Large-Scale Game-Playing Multi-Agent Systems on a High Performance Computing Infrastructure Chairi Kiourt and Dimitris Kalles School of Science and Technology, Hellenic Open University, Patra, GR-26335, Greece. {chairik,kalles}@eap.gr .

ABSTRACT The simulation of societies requires vast amounts of computing resources, which must be managed over distributed or high performance computing infrastructures to provide for cost-effective experimentation. To that end, this paper presents a novel platform for the segmentation and management of social simulation experiments in game-playing multi-agent systems; the platform, also serves as a working proof of concept for similar experiments. The platform is managed through a web-based graphical user interface, to combine the advantages of powerful grid infrastructure middleware and sophisticated workflow systems in a way that some generic functionality is sacrificed for the benefit of obtaining a smooth and brief learning curve, without compromising security. The paper sets out the architecture and implementation details of the platform and demonstrates its use with two sample games, RLGame and Rock Scissors Paper, to underline the scale of the experiments and to indicate the class of social simulation problems that it can help investigate. The platform can be loosely coupled with analytics software for data mining; for our sample problems, this analysis leads to associating the learning mechanism each agent employs with its eventual performance ranking. Keywords: Distributed Computing, Multi-Agent Systems, Game Playing, Social Simulation.

1

1. INTRODUCTION Research in quite a few scientific areas can be very demanding on computational resources [1]. Usually, such demands are due to complex experimental studies and their subsequent analysis. For example, Artificial Intelligence (AI) experiments trying to simulate the human brain require thousands of Central Processing Unit (CPU) cores to simulate brain activity for just one second [2]. Similarly, most Agent-Based Simulation (ABS) experiments are implemented in distributed computing systems [3] [4]; thus has emerged the field of computer social simulation [5]. Resources for large scale experiments and Multi-agent systems (MAS) are often available via Grid Infrastructures (GI) [1] [4]. These have driven the development of Agent Based Modeling (ABM) platforms [5][6], which hold substantial potential despite their limitations. The most prominent limitation of those platforms is that experiments run on single clients in computer clusters and, consequently, the distribution of the experiment has to be programmed by the users, who must be competent at using external distribution libraries. This limitation is overcome by our platform. In the early 1990s computer social simulation was presented as a way of modeling and understanding social processes [5], based on ideas about the emergence of complex behavior from relatively simple activities [7][8]. Moreover, social learning research has been inspired by the ability of humans to learn from environments which are rich in people, interactions and unknown information [9]. Social Learning (SL) is usually introduced as a learning technique in environments where more than two agents act autonomously, each one with its own goal, information and knowledge about the world (and the other agents) [7][21]. Social learning, like Reinforcement Learning (RL), was primarily developed using traits of human behavior, i.e. the ability of humans to learn from environments that are rich in people, interactions and lots of unknown information. Studying multi-agent systems can be best visualized along a primary axis, which spans the collaboration vs. competition spectrum, and a secondary axis, which spans the size spectrum of agent societies [7][21]. Games constitute a simple but powerful form of social environments [9]. Our previous experiments [10], developed on multi-agent game playing social events, have shown 2

that the socialization of agents produce powerful game playing synthetic characters. Learning in a game is said to occur when an agent changes a strategy or a tactic in response to new information, thus mimicking human behavior [5][7][9][11][12]. For a game agent, the social environment is represented by a game with all its agents, components and entities, such as rules, pay offs, penalties etc., amongst others [7][8][9][10]. The main scientific motivation of our work stems from the observation that while individuals in any given society generally do compete against each other, attempting an improvement of their relative position within their group, they also, at the same time, have a vested interested in their group becoming more powerful in the broader societal context. While this is an observation of a social nature, we are interested in providing computational tools for confirming it via simulation; this constitutes our technical motivation. Collective performance improvement via individual competitions can be easily observed in a wide variety of group contexts. In a school courtyard, pupils do play against each other with the aim to top their local ranking table but also hope to collectively improve their performance when their school faces off an adversary and individual matches can be scheduled. Chess players in clubs participate in a series of intra-club matches before leading to chess club face offs, where each club is represented by some of its members. Football teams participate in national tournaments and, then, champions get a chance to represent their leagues at international events. These are all examples of individual characters working for their own purposes but also sharing a common goal of group improvement. So, for all these contexts, an abstraction is evident: one needs to optimize the effort to be spent with courtyard colleagues, who act as competitors, before one tries to tackle opponents from another group. There exists, therefore, an interesting co-operation/competition dilemma: you need to practice effectively and efficiently, before you actually challenge an unknown opponent. To translate the metaphor in the context of Multi-Agent Systems (MAS) endowed with learning capabilities, one needs to define a “game” to allow for two opponents (of varying strength, tactics and motives) to compete against each other, then to create an 3

environment where arbitrary collections of agents compete against each other, given a limited amount of learning resources (time, allowable number of practice games, allowable number of defeats: one can really think of several such resources) and, then to design an evaluation toolkit to measure how two distinct groups of agents manage their intra-group training with respect to their inter-group face off. The allowable degrees of freedom for such experiments are more than a few; besides learning resources, one can experiment with a variety of learning mechanism configurations (thus, simulating different characters; for example, fast vs slow learners, risky vs conservative learners, exploiters vs explorers, etc.), as well as a variety of opponent selection mechanisms (opting to play against a stronger or a weaker opponent, opting to play against an opponent of unknown stature, etc.), all of them leading to a wealth of social interactions which can be recorded and analyzed with the objective of identifying interesting (or promising) behaviors. To facilitate the research above, one needs to address two broad directions, each of which gives rise to a different type of research questions. On one hand, one needs to design, experiment and analyze the results of various learning mechanisms, as well as social interaction mechanisms, to investigate the emergence of social hierarchies and of “best” individuals; this direction is best served by machine learning, data mining and analytics at large. On the other hand, testing individual and group interactions with large numbers of individuals and groups, complex games, diverse playing and learning behaviors, etc. gives rise to expensive experiments that requite the utilization of high performance computing infrastructures and the development of tools to assist with managing the experimentation life cycle (from design, to deployment, and to analysis); hence the need for multi-agent based simulation research and development, which is the main focus this paper. To comprehensively experiment with complex simulations of real life, it is now common for available computer resources to be pooled from all over the world [2]. There exist case studies where Grid Infrastructures (GI) provide a unified and powerful simulation environment and where large-scale Multi-Agent Based Simulation (MABS) systems facilitate the study of complex Social Environments (SE) composed of synthetic agent-characters and diverse entities [10][13]; earlier important ones focus on 4

environmental monitoring and analysis systems, developed for a broad range of scientific areas [7][14]. As predicted in [1], grid infrastructures have emerged as a need in many subareas of computing; we can now say that they are not only a part, of computer science, but present in almost all systems that need powerful computation resources, such as: experimental sciences, operations research, environmental scientists etc.[15][16][17][18][19][20]. The main contribution of the platform presented in this paper is the experiment distribution ability over the grid infrastructures, for the faster execution of the experiments and for the improved use of the resources, as well as for the subsequent streamlining of collecting the experimental results. Our platform can be thought of as filling the middle ground between two powerful extremes; on one hand we have the complicated middleware of grid infrastructures (such as gLite) and on the other hand we have workflow systems, both of which demand a substantial learning curve. We propose a platform where some generic functionality is sacrificed for the benefit of lowering the technical barrier to entry. Our platform parallelizes the experiment by splitting it into sub-experiments running autonomously in available worker nodes of grid infrastructures. It also caters to users with limited expertise on grid infrastructure, as most functionalities are automated, generating log files, for each experiment, which may be subsequently analyzed. Additionally, the web-based implementation of the platform provides useful availability and portability, with a friendly Graphical User Interface (GUI), which is an important advantage compared to other platforms. Furthermore, our paper presents a simple benchmarking of synthetic agents’ performances and analyzes that performances across a range of properties, to demonstrate the practical aspects of using our experiments distribution platform for the purpose of studying how agents behave in social environments, (in our demo data is generated by games between agents in tournaments, which serve as our social events). The rest of this paper is structured in five sections. The next section provides a brief background of social organization research and high performance computing infrastructures, as well as their relation to multi-agent based simulation systems. The third section introduces the architecture and implementation aspects of our multi5

agent based simulation platform for grid infrastructures. The fourth section introduces the games we use as test cases to demonstrate our system; we also present the results of our experiments, as managed from the platform and we discuss their relevance towards understanding competition vis-à-vis cooperation in artificial social systems. The fifth section presents a brief qualitative comparison between our platform and related systems for multi-agent system experiments. The last section concludes the paper and sets out the directions for future work.

2. A LITERATURE REVIEW High performance computing software and hardware architectures have recently emerged to support large scale experiments in a variety of fields, including games and multi-agent systems, where desktop or organization-wide resources may be too limited. This section provides a literature review of multi-agent based simulation tools, multi-agent social organizations in high performance computing infrastructure systems. Cloud computing systems are a relatively new addition to the high performance computing family [22][23][24]. Whether they will eventually displace grid computing systems is still debatable [22][23][24], but both technologies hold potential for multiagent systems. Today’s diverse scientific area researchers who need high performance computing might prefer grid computing Infrastructures because of a range of freeaccess features for the research community. Still, this is not enough, as scientists should spend more time on how to interpret experimental results rather than how to build and run an experiment in high performance computing systems [25][26]; in this sense grid computing still has a long way to go and it is in this direction that our work is contributing. Currently, there are several simulation platforms in use by many scientists. Most popular among these are NetLogo, Jason, Repast and Mason, each having its own characteristics and specific applications [6] to help a user gauge their applicability for a particular experimental need. Multi-agent based simulation platforms support the creation and the study of virtual environments and software agents [6][28][29]; in some special cases a platform can simulate software agents trained from real sensor6

collected data or from manufacturing systems [29]. Additionally, as will clearly become apparent below, an increasing number of large-scale multi-agent systems is being served by the high performance computing environment, and though many individual differences are present, the underlying architectural concept of putting high performance computing and multi-agent systems together thrives [30][31].

2.1 Grid computing and multi-agent based systems A recently built, agent-based coalition formation model to study agent trust [31] resulted in a recommendation to adopt a high performance agent-based computing platform [32] for exploring social simulation large-scale scenarios; that work adopted existing systems: like a multi-agent based simulation platform [13][32] and Repast high performance computing [33][34]. Researchers from many scientific areas argue that the combination of multi-agent based simulation and high performance computing infrastructures is inevitable, when experiments with large agent numbers are to be carried out. This is especially true for agent societies with sophisticated interactions and Collective Intelligence (CI) experiments [30][31][33][34][35]. Decraene et al. [31][36] developed an agent-based system with strong high performance computing capabilities. Due to the high complexity of the underlying experiment, the sizable number of iterations, the diversity of models, the need for experiment design support and for the analysis of simulations, they were led to combine multi-agent based simulation with high performance computing. The simulation models are represented using XML files, for safety and convenience purposes [37]. That system is layer based and uses text files as means of communication between layers [37]. Also, scenario files containing information about experimental settings are managed with XML files. Still, the overall system works as a black-box experimental one, similar to grid infrastructures: it does not accept interferences with the experiment during the its progress. With the Pandora [20], an open-source framework for designing multi-agent system experiments on high performance computing systems, one can witness the deployment of a high performance agent based modeling framework on clouds. The net result seems, again, to be inconclusive, as solutions seem to be highly dependent on budget constraints and case study properties which may be difficult to generalize. 7

Blanchart et al. [33] present a new multi-agent based simulation platform architecture running on a cluster or grid infrastructure in order to generate sufficient data for scientific use [33]. They recommended that multi-agent based simulation and high performance computing infrastructures have to be coupled, especially in cases of social simulations with need of computational resources [33]. Due to the complex environment of their Sworm [33] project, an agent-based simulation system was built to represent the influence of the reproduction of earthworms based on soil structure and nutrient availability. This system uses XML files to produce descriptions of all the possible simulations for the iterations. The web cluster access and management was performed over a web-based graphical user interface [33]. Multi-Agent Modeling Toolkit [38] (MAMT) is a new multi-agent system platform built over some existing well-known tools, the Agent Unified Modeling Language (AUML- an extended version of the Unified Modeling Language for multi agent systems [39]) and the Java Agent Development (JADE- a platform supporting implementations of multiagent systems through a middle-ware [40]). MAMT compares favorably to related platforms [38]; it can store agents to a repository for reuse or export agent code for further development. Basically, MAMT is an advanced form of JADE with a new graphical user interface and additional features. The HLA_Grid_RePast [41], is presented as a middleware platform for executing large scale experiments on the Grid, based on the RePast MABS platform. The High Level Architecture (HLA) Grid RePast integrates two different middleware systems. At the bottom level, there lies HLA Grid, as a middleware to support HLA simulations on the Grid [42]; at the top level one finds HLA RePast, which supports the execution of multiple interacting instances of RePast agent-based models within the HLA [43]. The general perspective of this platform is to submit multiple agent based experiments across specific clusters available via Grid infrastructures, equipped with specific software. The platform has been reported to have delivered promising evaluation results with two powerful clusters [41]. The most common limitation of these platforms is that experiments must be run on single Worker Nodes (WN) of computer clusters, and the distribution of the experiments must be programmed by the user. Most of the administrative domains 8

(User Interface, UI) of grid infrastructures are built on Scientific Linux with an open middleware (usually, gLite) for the user but, also with various restrictions. gLite was created by a large community in the EGEE project [15], with a plethora of software libraries, compiler suites, programming languages and tools. Our platform serves to bridge the gap between the sophistication of the distributed computing environments and the technical capacity of the scientific investigator who aims to use such environments.

2.2 Cloud computing and multi-agent based systems Cloud Computing is a quintessential technology for today’s gaming companies, to address aspects of availability of infrastructures, to improve scalability, responsiveness, user experience and to enable new business models [44]. Cloud computing technologies can easily couple with agent based mobile systems, such as CloneCode [45] and MAUI [46] (a mobile agent is software which can be sent out from a computer to roam into a network [47]). Marzolla et al [44] present an adaptive cloud resource provisioning for massively multiplayer online games (MMOG). Their system works by adding or releasing resources according to the level of QoS (for example, as measured by latency between servers and clients). However, their system is not pro-active as they do not consider load prediction when computing the amount of necessary resources. In general, they built a cloud based monitoring system with the response time as a main target quantity. Similarly, a multi-agent based negotiation framework [48] was also prototyped, using cloud computing to provide faster response times, and its effectiveness was verified be the simulation experiments on CloudSim [48]. Elastic-JADE [49] is a multi-agent platform based on JADE [40] which aims to allow a local JADE platform to automatically scale up and down using Amazon cloud resources when the local platform is heavily loaded. A similar platform is described in [50], which presents a dynamic resource provisioning and monitoring system. In general, it is a multi-agent system which manages the cloud provider’s resources while taking into account the customers’ quality of service requirements as determined by the service-

9

level agreement (SLA). Evaluation indicated increased resource utilization and decreased power consumption without incurring SLA violations. Dynamic Resource Provisioning and Monitoring (DRPM) systems, in cloud computing architectures, feature new virtual machine selection algorithms, coined as the host fault detection algorithm. Al-Ayyoub et al. [50] propose a DRPM system which is evaluated using the CloudSim tool. The results show that using the DRPM system increases resource utilization and decreases power consumption while avoiding SLA violations [50].

3. PLATFORM ARCHITECTURE AND TECHNOLOGIES Our platform overcomes the difficulties of executing experiment in grid infrastructures, with an easier, quicker and technically robust approach, focusing in large scale machine learning experiments. It distributes a large experiment to many subexperiments, allocated over independent Worker Nodes (WN) for better parallelization, and can be deployed over existing grid infrastructures without compromising the security of existing systems. The platform was developed in JAVA and its architecture is based on three modular layers (Figure 1(a)), each of which consists of several objects and sub-objects. A bottom layer contains the principal objects of a social environment: agents, the game, the tournament scheduling systems etc. The middle layer is the multi-agent based simulation platform, a system which routes communication between the social environment (Objects layer) and the top layer (Monitoring layer). It manages all the attributes of a social experiment, by scheduling the matches, processing the agent configuration files and other related housekeeping activities, before the beginning and during the execution of the experiment. Additionally, the platform analyzes the experiment submission process to the Grid and the key selected status data of the experiments and presents them on a graphical user interface of the top layer, which is the Monitoring layer, thereby creating a visual communication channel between the whole system and the user.

10

3.1 Combining grid systems with web-based graphical user interface tools Several projects have been developed with multi-agent based simulation systems running on high performance computing infrastructures as monitoring tools [3], for studying agents’ performance in simulation experiments [4]. Many of these projects are built in the classic manner of gLite or Globus middleware, while others are built through web-based workflow tools and still others are directly built as autonomous web-based applications. Almost all existing case studies of multi-agent system on the grid to date, do not provide any human interaction with the experimental environment during experiment execution: they provide a simple flag system for status information and data collection (the latter only after the end of the experiment), as almost all of these systems run on single worker nodes. This is an important disadvantage for large scale experiments as any distribution of the experiments has to be programmed by the users (which is also the case for any need to integrate with external distribution libraries). This limitation is overcome by the platform we present in this paper. To bridge the interaction (communication) gap between a human monitoring experts and a multi-agent system running on the grid, we use real-time bidirectional synchronization of generated experiment data between the grid storage elements, the grid user interface and a web server (running on a personal computer, for example), with all communication, management and control information stored in XML files. Research in this type of architectures facilitates human interaction with large-scale multi-agent system simulation experiments by providing an autonomous domain space for each user, where neither data security nor system security are compromised (we use standard tools available to all grid infrastructure users). One could argue that by having two different computer systems for the communication of the web server and the grid administrative domains, we likely increase synchronization time, however, this is something that will probably not affect the monitoring system experience, because it is practically impossible for a user to visualize all game executions “on the fly”.

3.2 Large-scale multi-agent experiment distribution This section describes the segmentation of a large-scale multi-agent system experiment to sub-experiments running in parallel, to optimize resource usage. Due to

11

the nature of our experiments, their segmentation is implemented via creating matches as autonomous grid jobs. A social event experiment consists of many matches between the participants. Each match is an autonomous job with input and output data, managed from the multi-agent based simulation platform, located in the administrative domain-user interface of the grid infrastructures. Each job runs on a random free worker node, of a random free cluster of the grid infrastructures (in principle, a user has no control over which worker node to select for a given job). After transferring the experiment set-up to the grid administrative domain, segmentation proceeds with implementing the round robin algorithm between all agents. The large folder icon, over the administrative domain, of Figure 1(b) represents the multi-agent system experiment and its input and output data. After the segmentation of the experiment, the platform transfers all necessary data of each match–job to the storage element, in parallel (the blue floppy disk icons on the left in Figure 1(b)). During data transfer the system executes the autonomous jobs and finds the available clusters of the high performance computing infrastructures through the Worker Management System (WMS). Thereafter, the Computing Element (CE) of each cluster searches for free worker nodes while transferring the necessary data of the experiment from the grid storage element to its own storage element. When an available worker node is found, the job starts after all required data is fed from the storage element to that worker node. After the job starts a unique Job ID is created and communicated to the user interface and to the web-based graphical user interface for monitoring the status of the job. After the end of the job, a reverse data flow brings the result of the experiment and all newly generated data to the user interface (green floppy disk icons) to be made available to subsequent stages. This process is same for all the matches of a social event. The most important key point of the proposed platform is that it does not require any specialized grid usage knowledge. An alternative conventional implementation of a similar segmentation scheme, with existing grid infrastructure features, would likely require the generation of dozens of specialized shell-scripts for each job, also requiring special logic to counter the fact that a worker node typically resets every 24 hours. It should be obvious that to avoid data loss and experiment restarts due to such 12

problems, a need for supreme command of grid infrastructure systems and programming is not an exaggeration.

13

(a)

(b)

Figure 1. (a) Platform layers, (b) Segmentation of a large-scale experiment to sub-experiments.

14

3.3 The platform management toolkit Access to grid infrastructures is established through an internet access terminal; the multi-agent based simulation platform services and the grid services can be managed through a web-based JAVA application. This application implements a comprehensive SSH communication protocol, through which all prerequisite grid commands can be submitted and executed. This application was built over the existing infrastructures, aiming for safer and faster access to the grid and to the new multi-agent based simulation services we developed.

Figure 2. Experiments management graphical user interface.

Figure 2 presents the graphical user interface toolkit for the top layer of the platform (see Figure 1 (a) and (b)), for accessing the grid infrastructures and managing the multi-agent based simulation application. It is separated into two column panels, A and B, each consisting of several layers. Panel A represents the SSH security sub-systems, which enable communication between the two computers through commands, grid user interfaces and user terminals. Initially, the A1 layer is used for connection to the remote computer (in our case, to the grid user interface). The remote domain information should be imported as follows: after a connection request, the A3 layer (information window), informs the user 15

about any responses from the remote domain. All available commands of the user interface, the SSH protocol and the multi-agent based simulation platform can be imported, as text, from the input field of the A2 layer and subsequently one may retrieve the results from the A3 layer. Generally, the A2 layer is the input of the SSH systems and the A3 layer the output. Panel B contains three layers which correspond to experiment execution, experiment distribution monitoring (and managing) and experiment evolution monitoring. After a user successfully connects to the system - as presented in the previous paragraph- in order to initiate some grid activities, the button “Init” of the B1 layer should be pressed; this initializes the customizations of the grid by issuing specific grid infrastructure commands (for example, declarations of virtual organizations or storage element usage, which can anyway be manually submitted from panel A too). By pressing the green button of layer B1, the drop-down list in the left, automatically loads all available experiments of the user located in the user interface. Selected experiments can be managed from the rest of the component of the application. When starting a new experiment with one of the available from drop-down list, the number of iterations per match is to be input in the field, and then submitted via the “Start” button. Each time the “Update” button is pressed, the application automatically loads the status of the selected experiment from the drop-down list in layers B2 and B3. Layer B2 displays information in tabular form about each job-match of the experiment. This information includes agents’ names, number of iterations per match, step number of the round robin sequence, job location (worker node), status, unique job id and indicative dates of start, stop and delay dates. Selecting a job row and pressing the “Get” button opens a new browser tab with more detailed information about the cluster, worker node and job status. Users may easily stop a job-match by selecting the corresponding row from layer B2 and pressing the “Kill” button; in that case, the multi-agent based simulation platform re-submits the same Job to different resources after a few seconds. The termination of a job may be sometimes necessary, due to long waits or other problems which spring up at the grid infrastructure.

16

Figure 3. Multi-agent based simulation platform sequence diagram.

The complete functionality and operating procedures of the platform are shown in the sequence diagram of Figure 3, wherein the three layers of the platform are separately shown for highlighting individual functionalities. Essentially, this graph represents the gradual evolution of a large-scale experiment. By having all these technologies integrated into a single platform managed from a remote web-based graphical user interface, total management of a large-scale multi-agent system experiment can more easily, quickly and accurately, overcome the obstacle of black-box experiment executions which are the norm in grid infrastructures. Note that there is no need for additional software; moreover, the web-based implementation of the platform makes it portable and enhances its availability in a variety of front-end technologies.

17

4. EXPERIMENTS AND EVALUATION To demonstrate the savings in experimentation management as well as the fact that such savings facilitate the implementation of larger, more complex experiments, we tested our platform with two different games, a simple one and a more complex one. Besides highlighting the savings, we also aim to show that more complex experiments allow one to uncover non-trivial results in how agents’ behavior evolves over time and over a variety of opponents. We start with the Rock Scissors Paper (RSP) [51] game, which is a simple game with few rules and few attributes and follow up with RLGame [52], a more complex one, which has already been the subject of a variety of investigations as regards playing tactics and learning to play behaviors. While we investigated RSP mainly as a means to highlight the steps to be taken so that our platform supports the deployment of any third-party zero-sum competition, in this paper we focus our reporting on RLGame, which is far more complex and has ample room to investigate complex more learning and playing behaviors. To enhance the credibility of the results we ran two independent but identically configured runs; these are TourA and TourB and our results confirm concurring trends among these two runs. Performance of individual agents is measured by their grades, the sum of games won by each agent during the tournament. Each win contributes +1 while each loss contributes -1 to the grade. The agent with the largest grade wins the tournament. In the following subsections, we present the experiments using the platform, we analyze the results of the experiments, from the social learning point of view, and we briefly discuss the validity of the results and, also, evaluate the platform.

4.1 Test games for evaluation This section describes the the games we employ for demonstrating evaluating our system and the results of our experiments managed from the platform. Rock Scissors Paper [51] is a zero-sum game widely use in [8][21] as a tool for various studies. In this game, each agent chooses, without the other agent’s knowledge, one of three possible moves: rock, scissors or paper. After the moves have been selected, they are revealed, and the winner is determined according to simple set of rules that have a playful interpretation: rock “breaks” scissors, scissors 18

“cuts” paper, and paper “covers” rock. If the same move is played by both agents, the game is declared a draw. A recent investigation of learning algorithms with RSP as the underlying game involves an agent repeatedly playing against an exploitable opponent, which also implements the Active-LZ [53] algorithm combined with a Lempel-Ziv algorithm to produce an agent that is provably asymptotically optimal if the environment is n-Markov. It also has been the subject of computer tournaments for several years [54]. In this work, we use RSP as a simple third-party game to demonstrate the ease of adapting our multi-agent based simulation platform to any zero-sum turn-based two-player game. RLGame, is a strategy board game [52] which consists of two agents and their pawns and is played on an n x n square board. Two a x a square bases are on opposite board corners; these are initially populated by β pawns for each agent. The white agent starts the game from the lower left base and the black agent starts off the upper right one. The goal for each agent is to move a pawn into the opponent’s base or to force all opponent pawns out of the board.

Figure 4 Example of game rules application.

Some of the most important rules of the RLGame are: •

A pawn can move vertically or horizontally to an adjacent free square, provided that the maximum distance from its base is not decreased (so, backward moves are not allowed).

19

•

The pawn can move out of the base to any adjacent free square.

•

Agents can move a pawn at a time.

•

A pawn that cannot move is lost.

•

A agent also loses by running out of pawns.

The implementation of some of these rules are presented in Figure 3. In the leftmost board in Figure 4, the pawn indicated by the arrow demonstrates a legal (“tick”) and an illegal (“cross”) move, the illegal move being due to the rule that does not allow decreasing the distance from the home (black) base. The rightmost boards demonstrate the loss of pawns, with arrows showing pawn casualties. A “trapped” pawn automatically draws away from the game, either in the middle of the board or when there is no free square next to its base of the white pawn. The learning mechanism of each agent is based on approximating its (reinforcement learning inspired) value function with a neural network (NN) [10][55]. Figure 5 demonstrates how the neural network maps board snapshots to values for the temporal difference (TD) [55] learning mechanism of each agent. As input-layer nodes we use the board positions for the next possible move, plus some flags on overall board coverage. The hidden layer consists of half as many hidden nodes. A single node in the output layer denotes the extent of the expectation to win when one starts from a specific game-board configuration and then makes a specific move. At the beginning all states have the same value except for the final states. After each move the values are updated through temporal difference learning method, which is a combination of Monte Carlo and dynamic programming ideas [55]. As a result, collective training is accomplished by pitting an agent against other agents so that knowledge (experience) is accumulated.

20

Figure 5 Learning mechanism of RLGame Error! Reference source not found..

Since the backbone of an agent’s knowledge is a combination of its personal TD algorithm and the artificial neural network which approximates its value function, different parameters for the knowledge mechanism correspond to a variety of playing “characters” encompassing fast/slow learners, risky/conservative agents, etc. Three different playing characteristic values represent the character differentiation of the agents. The policy to select between moves is the ε-Greedy (ε) policy [56]; with ε denoting the probability to select the best move (exploitation), according to present knowledge, and 1-ε denoting a random move (exploration). The learning mechanism is associated with two additional parameters, Gamma (γ) and Lambda (λ) [57]. Risky or conservative agent behavior is associated with the γ discount rate parameter, which specifies the learning strategy of the agent and determines the values of future payoffs, with values in (0,1); effectively, we associate large values with long-term strategies. The speed and quality of agent learning is associated with λ, which is the learning rate of the neural network, also with values in (0,1). For example, small values of λ can result in slow, smooth learning; large ones could lead to accelerated, unstable learning. These properties are what we, henceforth, term as “characteristic values” for the playing agents, combinations of which lead to a different type of player, as indicated in Figure 6.

21

Figure 6. Characteristic Values of learning mechanism of RLGame.

Initial experiments have demonstrated that, when an agent is trained with selfplaying, both agents would converge to having nearly equal chances to win [13] and that self-playing would achieve weaker learning performance compared to what a computer playing would eventually achieve against a human player, even with limited human involvement [55]. Additionally, in further experiments, social organization implementation of the RLGame has suggested that socially trained agents are stronger than self-trained ones [10].

4.2 Learning to play: concepts and tools for social interaction This section briefly recounts the systems we have built for investigating how agents’ may learn to play by mixing with their peers in competitions; it will also help further motivate the introduction of our new novel MABS platform, as already indicated by our generic comments on how other researchers have prototyped their MABS tools. Especially, as far as the contribution of Reinforcement Learning to games is concerned, we note that this technique has been always a prime example of applying AI in simulated societies. Actually, Shannon [58] and Samuel [59] provided the first stimulating examples, then, Deep Blue defeated Kasparov at chess in 1997 [60] and, more recently, Schaeffer’s team solved checkers completely 22

[61], though the latter two are also more indicative of the strength of search techniques. On the other hand, arguably one of the strongest contributors of Reinforcement Learning is the TD(λ) method for temporal difference [55] [57], which was successfully demonstrated in TD-Gammon for the game of backgammon [62]; therein using reinforcement learning techniques and self-playing, a performance comparable to that demonstrated by backgammon world champions was achieved. RLGame, was initially presented as a Competition extreme between two agents [52]. Subsequently, for the needs of social learning experiments, RLGame was transformed to a tool for studying multi-agent systems via its tournament version, RLGTournament [10], implementing a Round Robin (RR) scheme to pair participants against each other. RLGTournament fits the description of both an autonomous organization [7] and a social environment [5][9][11]. 4.2.1 An indicative experiment for a social learning event The social learning event is initiated with 126 agents in a round robin tournament with 100 games per match. Each agent played 125 matches against different agents. All agents had different character value configurations for ε, γ and λ, with values ranging from 0.6 to 1.0 with an increment step of 0.05. The value of 0.99 has been used instead of 1.0, but a trivial agent has also been added with a (1,1,1) configuration for (ε,γ,λ), accounting for 5×5×5+1 = 126 agents.

Table 1. Agent classification.

Both experimental results are summarized using agents’ grades and shown in Table 1. Three classes of agents are highlighted: Good, Moderate and Bad playing; these classes are associated with Table 2 and Figure 1. To determine class boundaries for both experiments, we split the grade values interval between the best and worst 23

values into three parts as annotated by the green, white and red columns. It also shows the agent numbers of each class according to their grades. It is obvious that the classes with the best playing agents (A-Good) are composed of fewer agents, which applies in both experiments. The auto-produced log files from the experiment are analyzed by the Orange Data Mining tool [63].

C -> Bad

B -> Moderate A -> Good

TourA Ra nki ng

Na me

E-Greedy

Ga mma

La mbda

1 2 3 4 5 61 62 63 64 65 122 123 124 125 126

Plr3 Plr6 Plr7 Plr8 Plr23 Plr76 Plr87 Plr69 Plr86 Plr81 Plr30 Plr20 Plr35 Plr45 Plr15

0.6 0.6 0.6 0.6 0.6 0.9 0.9 0.8 0.9 0.9 0.7 0.6 0.7 0.7 0.6

0.6 0.7 0.7 0.7 0.99 0.6 0.8 0.9 0.8 0.7 0.6 0.9 0.7 0.9 0.8

0.8 0.6 0.7 0.8 0.8 0.6 0.7 0.9 0.6 0.6 0.99 0.99 0.99 0.99 0.99

TourB Gra des

4004 4002 3974 3556 3548 -244 -268 -276 -308 -328 -2850 -2926 -3140 -3264 -3328

AVG_MOVES

Na me

E-Greedy

Ga mma

La mbda

30.04976 29.9796 34.72352 35.39648 41.69856 39.87752 38.6636 43.91168 40.2708 40.32136 46.89416 55.7484 48.62968 47.34064 52.41616

Plr1 Plr8 Plr11 Plr3 Plr7 Plr76 Plr67 Plr86 Plr87 Plr92 Plr45 Plr15 Plr30 Plr50 Plr20

0.6 0.6 0.6 0.6 0.6 0.9 0.8 0.9 0.9 0.9 0.7 0.6 0.7 0.7 0.6

0.6 0.7 0.8 0.6 0.7 0.6 0.9 0.8 0.8 0.9 0.9 0.8 0.6 0.99 0.9

0.6 0.8 0.6 0.8 0.7 0.6 0.7 0.6 0.7 0.7 0.99 0.99 0.99 0.99 0.99

Gra des

3910 3788 3702 3696 3668 -222 -290 -324 -340 -412 -2600 -2766 -2920 -3064 -3692

AVG_MOVES

28.69576 32.46296 31.35928 33.1976 30.65832 38.22688 41.50448 40.9668 38.99816 41.79024 52.74032 54.98344 49.82832 52.22016 51.44688

Table 2. Detailed results of both experiments.

Table 2 includes the ranking positions, the synthetic agent characteristics, the grades and the average moves of the five most popular agents of each class from both experiments. The first two columns present the classes and the ranking positions of the agents. Besides the two aforementioned columns, Table 2 is separated into two sub-tables - one for each experiment - where the data of each is associated with the ranking and the classes columns. The colors, from green to red through white, for each column (ε,γ,λ) of each experiment table present the classification of the data, from the lowest value to the highest. Exactly the opposite coloring is used for the AVG_MOVES, in which green represents the small and red the high number of moves. The colors of the column “Name”, for both experiments, show the correlation between synthetic agents according to their characteristics, grades and classes. In example, the blue color shows the agents which were classified in the same classes in both experiments, in relation to their ranking positions. On the other hand, the yellow color shows the indirection of new agents coming from other classes. The “Grades” column shows each agent’s grades and performance with arrows, which are associated to their ranking position. 24

The grades of each agent from both experiments are very similar, as their average moves associate with their ranking positions. By presenting a small sample from our experiments - the top 5 agents from each class, summing to 15 agents - 9 of them are the same; it is observed that 60% of the agents are ranked in the same classes and similar positions. This enhances the efficiency, reliability and usage of our social events, the efficiency of the synthetic agent differentiation and also the usability of our platform. Name

A-> Good Plr7 B-> Moderate Plr87 C-> Bad Plr15

E-Greedy Gamma

0.6 0.88 0.66

0.709 0.78 0.809

Lambda

0.72 0.67 0.99

Grades

AVG Moves

3821 32.69092 -304 38.83088 -3510 51.93152

Table 3. Synthetic agents related to the average of ε-γ-λ.

Using the average values of ε-γ-λ from both experiments of Table 2 for each class produces the ε-γ-λ characteristic values of Table 3. These were associated with the closest characteristic values of an agent from each class of Table 2, whose performance and average move values are placed in the last two columns. According to the data of Table 3, it could be said that it shows the most important characteristic values and synthetic agents from each class. Put simply, the best, the moderate and the worst agent characteristics can be defined from Table 1, which are also defined by their performances and moves in the graphs of Figure 7.

25

Figure 7. Agents allocation based on their grades and characteristics values (ε-γ-λ).

Figure 7 is composed of 4 different graphs, two per experiment. The two on the top present the results of TourA experiment and the two below the results of the TourB experiment. The colored bars, from red to green passing through orange to the right of each graph, classify the agents from Bad (red) to Moderate (orange) to Good (green) for attributes, performance and average moves per game. Each bullet on each graph corresponds to an agent; the short line segments outside of each bullet indicate the value of these three characteristic values (ε-γ-λ) according to how they are located in the graph and based on the direction of the line. The three large circles attempt to highlight agents clustering in accordance with the three clusters proposed in Table 1, the colors being associated to the clusters. Three small rectangular tables in the top left graph present three random agents to help provide visual insight as to how lines are drawn, i.e. what they mean as regards how the agents are located in the graph. The graphs also feature the top agents of each class from Table 3. The small green triangles show the positions of Plr7 (0.6, 26

0.7, 0.7), the small orange circles show the positions of Plr87 (0.9, 0.8, 0.7) and the small red squares show the positions of Plr15 (0.6, 0.8, 0.99). The left graphs show agent clustering according to their performance (Grades). The right graph shows agent clustering according to their average moves per game.

4.3 Evaluation and brief discussion We now briefly evaluate our platform as regards its two main features, namely the ability to streamline and facilitate experiments more effectively and the ability to exploit such effectiveness for running sizable experiments which can then uncover interesting social learning patterns. On the productivity enhancement front, we thus note that Table 4 summarizes the high-level resource consumption results for one of the RLGame experiment (TourB). The first column shows the expended CPU time (measured in hours) and the second column reports total elapsed time. The third column shows the total number of grid jobs actually submitted for the experiment, whereas the fourth column reports the absolute minimum required number of individual jobs. Actual Job

Nominal Job

CPU Time

Elapsed Time

Number

Number

6235

8556

8664

7875

Table 4. Statistics from EGI infrastructures.

Table 4 highlights that nearly as many as 10% of the originally planned grid jobs had to be re-scheduled, for a variety of reasons and that there does exist a sizable difference between the elapsed time and the actual CPU time. Note that in a conventional system, such problems typically have to be handled by the user whereas, in our platform, the system handles such exceptions and resubmits problematic jobs. This amounts to a substantial productivity enhancement as such problems are inherent in heterogeneous distributed infrastructures and can be due to a high number of requests from users, occasional worker node resets, hitting the maximum wall time for jobs, and data losses (to name a few).

27

On the multi-agent systems learning front, we believe it is rather clear that the cluster with the small number of agents (good playing agents) converge from medium to large values of γ, small values of ε and medium to small value of λ. This suggests that agents with those characteristics might prefer to use their experience a bit more than to explore new strategies and also seem to prefer longer term strategies, sacrificing speed for quality, whereas fast and short-term learners seem to fare less well. Attempting a crude but concise characterization, it could be said that good playing agents act wisely, temporary risk-takers and explorers are associated with moderate playing performance, and fast learners which do not attempt to enhance their knowledge with new strategies turn out playing poorly. This is an additional indication that the collective knowledge build-up of an agent depends both on its environment and on its initial set-up. Moreover, good playing agents prefer to use a moderate level of moves per game, which may mean that they are less spontaneous. Furthermore, a large λ seems to not really offer improve performance and also produces pointless multiple movements. In summary, the experiments indicate that the triple (ε,γ,λ) characterizing a player seems to be a succinct representation of its approach to playing and learning. Previous works [10][13][27] have suggested that social learning is more powerful than self-learning; the large-scale experiments set out in this paper also support this suggestion (Figure 7) and show how agent differentiation affects the learning effectiveness in social events. Of course, while social learning experiments in multiagent systems seems promising in uncovering interesting patterns, such large scale experiments can hardly be implemented in simple computers or small clusters.

5. A COARSE COMPARISON OF PLATFORMS Direct comparison of the proposed platform with similar systems, such as the ones presented in Section 2, is a difficult undertaking since “competing” systems mainly focus on the functionalities they provide, whereas our work extends such functionalities to be made available over grid infrastructures, which is far from a simple demand. This brings our platform closer to workflow systems [64] which, however, also demand a substantial total ownership cost, ranging from installation

28

overheads to user familiarization sessions; though less fearsome than commandline grid interfaces, they still command a high level of expertise. Our platform is situated in the middle ground, offering a convenient compromise, by sacrificing some of the generic technical capabilities of grid and grid workflow systems for the benefit of enhanced accessibility and ease of use for a slightly narrower experimental domain. As such, we feel that the key pillars of an evaluation have to focus on the productivity enhancements and on the observation that such enhancements allow for sizable experimentations to be carried out, in order to generate non-trivial research findings; section 4.3 already briefly evaluated our platform with respect to these two aspects. We have briefly reviewed several related systems in section 2; most of them can be succinctly described as large packages or libraries with several substantial capabilities, but, also, with several limitations, which our platform addresses. A high-level but broad qualitative comparison with these systems highlights some key advantages of our platform compared to them: 

The grass-roots Java development with the exploitation of the standard SSH protocol and of the standard grid middlewares (gLite, Globus) has partly elevated the complexity of our development but has also meant that our platform is much more change-resilient as we do not require external libraries or tools, which still strive to penetrate the multi-agent systems community. This is in stark comparison to the MAMT platform which is built over JADE and AUML, as well as Elastic-JADE, which do demonstrate a more layered architecture, utilizing various third-party components, but, at the expense of simplicity and, possibly, maintainability. A similar argument applies when we compare our platform to the EPIS project, which requires several server based services, and to Pandora, which is a generic library for building applications. Similarly, HLA_Grid_RePast is developed over the RePast agent based modeling platform; again, there is a trade-off which sacrifices simplicity and practicality for conceptual clarity.

29



Our platform is effective in distributing experiments over virtually unlimited computational resources, all over the world, scaling up experiments with virtually unlimited agent numbers, and supplying on-the-fly experiment monitoring information for all activities in running worker nodes. Most of the platforms described in section 2 are approaches which better fit a single-site approach, running in clusters or clouds, with relatively less information about the distributed jobs. Of course, grid workflow systems are more general than our platform when viewed from the distributed experimentation angle but, still, their generic nature comes at the (very substantial) expense of simplicity and ease of use.



Our platform accommodates a bottom layer which can utilize agent modules developed in any programming language which can be called from Java (whereas, for example, Repast HPC requires Logo-style C++ and specific libraries). This feature explicitly promotes the sustainability of legacy game agent module, which can be tailored with modest effort to match our platform’s input specifications.



The web-based graphical user interface provides wide portability and usability. A user has access to full status information and can manage all sub-experiments in running worker nodes. Additionally, all problematic jobs (sub-experiments) can be monitored and rescheduled, if required, in a timely manner (thus mitigating the productivity costs of lost jobs or jobs queued for a long time in worker nodes which were made unavailable.

This platform focuses on experiments with huge need of computational resources, which can be segmented to parallel sub-experiments without affecting the general experiment and it results. It does not override any of the powerful services (middleware) of grid infrastructures but, on top of exploiting these services, it is ideally suited to contexts where multi-agent based experiments are required, with agents sometimes co-operating and sometimes competing. To stress this suitability, the bottom layer of the platform is open for integration with a wide range of libraries and programming languages.

30

6. CONCLUSIONS The emergence of high performance computing (cloud and grid) technologies provides exciting opportunities for large-scale distributed simulation systems and experiments. The proposed platform overcomes several usage problems of modern grid systems; while it does not override any of the powerful services (middleware) of grid infrastructures, it does serve as an additional tool with a focus on large-scale experiments with substantial demands on computational resources. By taking stock of progress in HPC workflow systems, we have demonstrated a layered structuring of a multi-agent based simulation platform to help researchers in the design, implementation and the analysis of social learning experiments. A distributedcomputing application can thus be customized by a human monitoring expert who controls the execution of an experiment through a web-based graphical user interface. The parallelization of the sub-experiments, via their assignment to independent worker nodes, and the operation of the platform as an autonomous web-based application with no need for additional software, are the main advantages over existing similar systems. The experimentation and the subsequent evaluation results highlight that the platform can easily accommodate large scale experiments running on Grid infrastructures. However, the evaluation results also highlight that improving on the ability to handle lost grid jobs (and, as a result, to incur delays) is a critical point as far as platform usability is concerned. Additionally, the ability to customize the lower level of the platform, according to the needs of a particular experiment is an extra attraction for other scientific areas, which are also based on long (and errorprone) experimental sessions. As future work on the platform, we expect to allow the user to more actively manage experiments, thus integrating functionalities already present in complex workflow systems, and to more actively manage the computational resources (such as worker or storage node selection), allowing more knowledgeable users to get more involved in the mechanics of the distribution mechanism of the grid. That direction of future work inevitably moves on a direction to integrate support for conventional local clusters or clouds. 31

To exploit the usefulness of the platform, however, we also have a research front end that revolves around the experimentation with specific games in order to design, schedule and analyze learning/playing scenarios, allowing for more complex scenarios (for example, with restrictions on the type of agent interactions), with the expectation that the massive scale of agent interactions will uncover interesting playing behaviors.

ACKNOWLEDGMENTS This work used the European Grid Infrastructure through the National Grid Infrastructures NGI_GRNET, HellasGRID, as part of the SEE Virtual Organization. A demo of the platform alongside instructions to integrate it with third-party games can be found at http://www.splendor.gr/platform. Also the source code of the rock scissors paper game is available to users, as an example of the platform functionality and as a guide for the implementation of further applications.

REFERENCES [1]

I. Foster and C. Kesselman, The Grid 2, Second Edition: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco, CA, USA, (2003).

[2]

RIKEN HPCI Program, for Computational Life Sciences. Largest neuronal network

simulation

achieved

using

K

computer.

http://www.riken.jp/en/pr/press/2013/20130802_1/ (Accessed 18 August 2013). [3]

D. Mengistu, L. Lundberg and P. Davidsson, Performance Prediction of MultiAgent Based Simulation Applications on the Grid, International Journal of Computer, Information Science and Engineering, Vol:1 No:3, (2007).

[4]

T.J. Ingo and D. Pawlaszczyk, Large Scale Multiagent Simulation on the Grid, IEEE International Symposium on Cluster Computing and the Grid (CCGRID). Cardiff University, Cardiff, UK, 2005.

32

[5]

N. Gilbert and K.G. Troitzsch, Simulation for the Social Scientist, 2nd ed. Open University Press, (2005).

[6]

A,J. Rob, Survey of agent based modelling and Simulation Tools, Technical Report, DL-TR-2010-007, (2010).

[7]

J. Ferber, Multi-Agent Systems: An Introduction to Distributed Artificial Intelligence, Addison-Wesley, (1999).

[8]

Y. Shoham and B.K. Leyton, Multiagent Systems: Algorithmic, GameTheoretic, and Logical Foundations. Cambridge University Press, (2009).

[9]

V.N. Marivate, Social Learning methods in board game agents, IEEE Symposium Computational Intelligence and Games, (CIG '08), Perth, Australia, 2008, p.p. 323-328.

[10] C. Kiourt and D. Kalles, Social Reinforcement Learning in Game Playing, IEEE International Conference on Tools with Artificial Intelligence, Nov. 7-9, Athens, Greece, 2012, pp 322-326. [11] A. Caballero, J. Botia and A. Gomez-Skarmeta, Using cognitive agents in social simulations, Engineering Applications of Artificial Intelligence, Volume 24 Issue 7, (2011), p.p. 1098-1109. [12] B. Al-Khateeb and G. Kendall, Introducing a Round Robin Tournament into Evolutionary Individual and Social Learning Checkers, Developments in Esystems Engineering, Dubai, United Arab Emirates, December 2011. [13] C. Kiourt and D. Kalles, Building A Social Multi-Agent System Simulation Management Toolbox, 6th Balkan Conference in Informatics, Sep. 19-21, Thessaloniki, Greece, 2013, pp. 66-70. [14] M.P. Thomas, I. Burruss, L. Cinquini, G. Fox, D. Gannon, L. Gilbert, G. Laszewski, K. Jackson, D. Middleton, R. Moore, M. Pierce, B. Plale, A. Rajasekar, R. Regno, E. Roberts, D. Schissel, A. Seth and W. Schroeder, Grid Portal Architectures for Scientific Applications, Journal of Physics: Conference Series, vol. 16, no. 1, (2005).

33

[15] EGEE

Project,

Enabling

grids

For

E-science

[ONLINE]

http://cordis.europa.eu/projects/rcn/80149_en.html (Accessed 04 Dec 2014), (FP6-INFRASTRUCTURES, 2004). [16] B. Bansal and S. Bawa, Design and Development of Grid Portals, TENCON 2005 IEEE Region 10 , 21-24 Nov, Melbourne, Australia, 2005. [17] X.P. Medianero, B.M. Bonilla and M.V. Vargas, M, Grids Portals: Frameworks, Middleware or Toolkit, International Journal of Computer Science Issues (IJCSI), vol. 7, no. 3, (2010), pp. 6-11, [18] D. Arias, M. Mendoza, F. Cintron, K. Cruz and W.R. Rivera, Grid Portal Development for Sensing Data Retrieval and Processing, International Workshop on Grid Computing Environments, 2006. [19] L.M. Rosset, L.G. Nardin and J.S. Sichman, Use of High Performance Computing in Agent-Based Social Simulation: A Case Study on Trust-Based Coalition Formation. Conference 7th Workshop-School on Agent Systems, Environments and Applications (WESAAC'13), 2013. [20] P. Wittek and X. Rubio-Campillo, Scalable Agent-based Modelling with Cloud HPC Resources for Social Simulations. IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 2012, pp. 355 – 362. [21] D.L. Poole and A.K. Mackworth, Artificial Inteligence: Foundations of Computational Agents, Cambridge University Press, USA, New York, (2010). [22] I. Foster, Y. Zhao, I. Raicu and L. Shiyong, Cloud Computing and Grid Computing 360-Degree Compared. Grid Computing Environments Workshop, 2008, pp. 1-10 [23] M.H. Seyyed and K.B. Amid, Cloud Computing Vs. Grid Computing. ARPN Journal of Systems and Software. 2.5, May, (2012), pp. 188-194. [24] A.H. Hosam, A. Hamza and A. Tariq, Comparison Between Cloud and Grid Computing: Review Paper. International Journal on Cloud Computing: Services and Architecture (IJCCSA), Vol 2, No 4, August (2012). pp. 1-21.

34

[25] I. Gorton, Cyberinfrastructures: Bridging the Divide between Scientific Research and Software Engineering. Computer. Vol47, No 8, (2014), pp. 48 – 55. [26] G. Coulouris, J. Dollimore, J. Kindberg and G. Blair, Distributed Systems: Concepts and Design. Boston: Addison-Wesley, 5th Edition, (2011). [27] C. Kiourt and D. Kalles, Development of Grid-Based Multi Agent Systems for Social Learning. IEEE International Conference on Information Intelligence, Systems and Applications (IISA 2015), Corfu, Greece, Jul 04-07, 2015. [28] S.F. Railsback, S.L. Lytinen and S.K. Jackson, Agent-based Simulation Platforms: Review and Development Recommendations, Simulation. Vol 82, No 9, September, (2006), pp. 609-623. [29] J. Barbosa and P. Leitao, Simulation of multi-agent manufacturing systems using Agent-Based Modelling platforms, 9th IEEE International Conference on Industrial Informatics. Jul. 26-29, Portugal, 2011, pp. 477 – 482. [30] N. Collier, Repast HPC Manual, 6th February, (2012). [31] J. Decraene Y.T. Lee, F. Zeng, M. Chandromohan Y.C. Yong and M.Y.H. Low, Evolutionary Design of Agent-based Simulation Experiments (Demonstration). 10th Int. Conf. on Autonomous Agents and Multiagent Systems, 2011, pp. 1321-1322. [32] L. Paulo, I. Udo and R. Claus-Peter, Parallelising Multi-agent Systems for High Performance Computing, The Third International Conference on Advanced Communications and Computation (INFOCOMP 2013), 2013. [33] E. Blanchart, C. Cambier, C. Canape, B. Gaudou, T.-N. Ho, T.-V. Ho, C. Lang, F. Michel, N. Marilleau and L. Philippe, EPIS: A Grid Platform to Ease and Optimize Multi-agent Simulators Running, Advances on Practical Applications of Agents and Multiagent Systems Advances in Intelligent and Soft Computing, Vol 88, (2011), pp. 129-134.

35

[34] S. Coakley, M. Gheorghe, M. Holcombe, S. Chin, D. Worth and C. Greenough, Exploitation of High Performance Computing in the FLAME Agent-Based Simulation Framework. IEEE 14th International Conference on High Performance Computing and Communications. 2012, pp. 538 – 545. [35] N. Collier and M. North, Repast HPC: A Platform for Largescale Agent-based Modeling,

Large-Scale

Computing

Techniques

for

Complex

System

Simulations, John Wiley & Sons, Inc. (2011). pp. 81-110 [36] J. Decraene, M. Chandramohan, M. Low and C. Choo, Evolvable Simulations Applied to Automated Red Teaming: A Preliminary Study. In Proceedings of the 42th Winter Simulation Conference, 2010, pp. 1444–14 [37] L.G. Nardin and J.S. Sichman, Simulating the impact of trust in coalition formation: A preliminary analysis. Advances in Social Simulation, PostProceedings of the Brazilian Workshop on Social Simulation, 2011, pp. 33–40. [38] U. Manzoora and B. Zafarb, Multi-Agent Modeling Toolkit – MAMT, Simulation Modelling Practice and Theory, Vol. 49, (2014), pp. 215–22. [39] B. Bauer, J.P. Mü ller and J. Odell, An Extension of UML by Protocols for Multiagent Interaction, Proceeding, Fourth International Conference on MultiAgent Systems, ICMAS 2000, Boston, IEEE Computer Society, 2000. [40] F. Bellifemine, G. Caire, and D. Greenwood, Developing Multi-Agent Systems with JADE. Wiley, (2007). [41] D. Chen, K.G. Theodoropoulos, J.S. Tuner, W. Cai, R. Minson and Y. Zang, Large scale agent-based simulation on the grid, Journal Future Generation Computer Systems, Vol.24, No.7, (2008), pp. 658-671. [42] Y. Xie, Y.M. Teo, W. Cai and S.J. Turner, Service provisioning for HLA-based distributed simulation on the grid, The Nineteenth ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation, PADS 2005, Monterey, CA, USA, June 2005, pp. 282–291.

36

[43] R. Minson and G. Theodoropoulos, Distributing RePast agent-based simulations with HLA, European Simulation Interoperability Workshop 2004, 04E-SIW-046, June, 2004. [44] M. Marzolla, S. Ferretti and G. D'Angelo, Dynamic resource provisioning for cloud-based gaming infrastructures, Theoretical and Practical Computer Applications in Entertainment, Vol. 10, No. 3, (2012), pp. 1-20. [45] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, CloneCloud: Elastic execution between mobile device and cloud, in Proc. of the 6th European Conference on Computer Systems (EuroSys’11), Salzburg, Austria. ACM, April 2011, pp. 301–314. [46] E. Cuervo, A. Balasubramanian, D. kiCho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, MAUI: Making smartphones last longer with code offload, in Proc. of the 8th International Conference on Mobile Systems, Applications, and Services (MobiSys’10), San Francisco, California, USA. ACM, June 2010, pp. 49– 62. [47] D. Chess, C. Harrison and A. Kershenbaum, Mobile agents: Are they a good idea?, in Mobile Object Systems Towards the Programmable Internet, 2005, Vol. 1222. [48] J. Chen, X. Han and G. Jiang, A Negotiation Model Based on Multi-agent System under Cloud Computing. The Ninth International Multi-Conference on Computing in the Global Information Technology. Seville, Spain. June 22, 2014, pp. 157 to 164. [49] U. Siddiqui, G.A. Tahir, A.U. Rehman, Z. Ali, R.U. Rasool and P. Bloodsworth, Elastic JADE: Dynamically Scalable Multi Agents Using Cloud Resources, Second International Conference in Cloud and Green Computing (CGC), 1-3 Nov, 2012, pp.167-172,. [50] M. Al-Ayyoub, Y. Jararweh, M. Daraghmeh and Q. Althebyan, Multi-agent based dynamic resource provisioning and monitoring for cloud computing

37

systems infrastructure. Journal Cluster Computing. Vol. 18, No. 2, (2015), pp 919-932. [51] L.M. Littman, Markov games as a framework for multi-agent reinforcement learning, Eleventh International Conference on Machine Learning, 1994, pp.157-163. [52] D. Kalles and P. Kanellopoulos, On Verifying Game Design and Playing Strategies using Reinforcement Learning, Proceedings of ACM Symposium on Applied Computing, special track on Artificial Intelligence and Computation Logic, Las Vegas, 2001. [53] V.F. Farias, C.C. Moallemi, B. Van Roy and T. Weissman, Universal reinforcement learning, IEEE Transactions on Information Theory, Vol 56, No 5, (2010), pp. 2441-2454. [54] D. Billings, Thoughts on RoShamBo. ICGA Journal, Vol 23, No1, (2000), pp. 3– 8. [55] R. Sutton and A. Barto, Reinforcement Learning - An Introduction, MIT Press, Cambridge, MA, (1998). [56] J.G. March, Exploration and Exploitation in Organizational Learning. Organization Science, Vol. 2, No 1, (1991), pp. 71-87. [57] R. Sutton, Learning to Predict by the Methods of Temporal Differences, Machine Learning, Vol 3, No 1, (1988), pp. 9-44, [58] C. Shannon, Programming a Computer for Playing Chess, Philosophical Magazine, Vol 41, No 4, (1950), pp. 265-275. [59] A. Samuel, Some Studies in Machine Learning Using the Game of Checkers, IBM Journal of Research and Development, (1959), pp. 210-229. [60] F.H. Hsu, Behind Deep Blue: Building the Computer that Defeated the World Chess Champion. Princeton University Press. (2002).

38

[61] J. Schaeffer, Y. Bjoernsson, N., Burch, A. Kishimoto, M. Mueller, R. Lake, P. Lu and S. Sutphen, Checkers is Solved, Science, Vol317, No 5844, (2007), pp. 1518-1522. [62] G. Tesauro, Temporal Difference Learning and TD-Gammon, Communications of the ACM, Vol 38, No 3, (1995), pp. 58-68. [63] G. Leban, B. Zupan, G. Vidmar and I. Bratko, VizRank: Data Visualization Guided by Ma-chine Learning. Data Mining and Knowledge Discovery, Vol 13, No 2, (2006), pp. 119-136. [64] A. Georgas, D. Kalles and V.A. Tatsis. Scientific Workflows for Game Analytics. In Encyclopedia of Business Analytics and Optimization, ed. John Wang, (2014), pp. 2115-2125.

39

Chairi Kiourt is a Phd Candidate at the Hellenic Open University, School of Science and Technology. He received a Bachelor degree in electrical engineering from the Technical Education Institute of Kavala, Greece. He also received an M.Sc. in Systems Engineering and Management from the University of Thrace, Greece. His research is on large scale agent socialization experiments, using game playing and machine learning.

Dimitris Kalles is an Assistant Professor on Artificial Intelligence with the Hellenic Open University, School of Science and Technology. He holds a PhD in Computation from UMIST (Manchester, UK) and is actively researching in artificial intelligence, software engineering and e-learning. He has taught several courses and supervised numerous dissertations and theses, both undergraduate and postgraduate, at the Hellenic Open University and at other universities. He has held positions in research and in industry and has served as expert evaluator for the European Commission, for the Corallia Clusters Initiative and for several other organizations. He has served as Chairperson and Secretary General of the Hellenic Society for Artificial Intelligence.

40

A Platform for Large-Scale Game-Playing Multi-Agent ...

A Platform for Large-Scale Game-Playing Multi-Agent ...

Suggest Documents

HAFNI-enabled largescale platform for neuroimaging informatics ...

A Scalable Multiagent Platform for Large Systems

A Multiagent Platform for Educational Resources

A Scalable Multiagent Platform for Large Systems

Platform for Multiagent Application Development ... - IEEE Xplore

NADIM-Travel: A Multiagent Platform for Travel Services Aggregation

a .net reinforcement learning platform for multiagent systems

A LargeScale Gene-Trap Screen for Insertional

Large-Scale Multiagent Platform Benchmarks - LIA

Large-Scale Multiagent Platform Benchmarks - CiteSeerX

Enhancements of the MIX multiagent platform

GameLog: Fostering Reflective Gameplaying for Learning - CiteSeerX

Multiagent Home Telecare Platform for Patients with ... - CiteSeerX

From a Generic MultiAgent Architecture to MultiAgent

Proposal for a Multiagent Architecture for Self

Ensemble modeling to predict habitat suitability for a largescale

A highthroughput ChIPSeq for largescale chromatin ... - Huber Group

Characterization of the Jason Multiagent Platform on Multicore ...

A Methodological Proposal for Multiagent Systems ... - CiteSeerX

JANUS: Another Yet General-Purpose Multiagent Platform - www ...

Towards Reactive Failure-Recovery Gameplaying ...

A multiagent architecture for bus fleet management

Market-Aware Agents for a Multiagent World

A Multiagent Evolutionary Algorithm for Constraint ... - CiteSeerX