Building a social multi-agent system simulation management toolbox

3 downloads 110 Views 689KB Size Report
ABSTRACT. The development of a novel Multi-Agent-Based Social Simulation. (MABS) platform is undertaken after considering the advantages.
Building A Social Multi-Agent System Simulation Management Toolbox Chairi Kiourt

Dimitris Kalles

School of Science and Technology Hellenic Open University Patras, Greece

School of Science and Technology Hellenic Open University Patras, Greece

[email protected]

[email protected] A recent promising social environment application using checkers was implemented in terms of a tournament [6]. Robotics have also proved a powerful test-bed for applying social learning techniques in social environments [9][10]. By implementing many social learning techniques in different robotic environments [9], whether social or not, it seems that results are closely related to those in social strategy board games [4], namely that social environments create stronger player behavior and strategies.

ABSTRACT The development of a novel Multi-Agent-Based Social Simulation (MABS) platform is undertaken after considering the advantages and disadvantages of existing platforms. We study their adaptability and usage in an existing strategy board game and attempt to model tournaments in social environments. To facilitate this experimentation, we arrive at the need to develop a new platform which features dynamic handling of game objects at runtime.

However, modeling and monitoring of games requires powerful simulation tools. By visualizing social interactions in a tool, the observation and customization of agents becomes easier and more efficient; this has led to the development of platforms which have been receiving increasing attention, like MASON [5][8], Repast[8], Jason [5] and Netlogo [1]. Comparisons of these platforms highlight their advantages and disadvantages [11], reinforcing the belief that Multi-Agent Based Simulation (MABS) tools are a key tool in studying multi agent systems.

Categories and Subject Descriptors I.2.1 [Computing Methodologies]: Artificial Intelligence – Applications and Expert Systems, Games. I.2.11 [Computing Methodologies]: Artificial Intelligence – Distributed Artificial Intelligence, Intelligent agents, Multi-Agent systems.

General Terms Experimentation, Measurement.

The rest of this paper is structured in four sections. The next section provides a brief background of existing MABS platforms and a brief description of RLGame and its social learning environment aspects. The third section describes the introduction of our social system and MABS platform. The last section summarizes the development effort and sets out the scheduled work on the platform.

Keywords Simulation, Multi-Agent System, Reinforcement Learning, Social Organization.

1. INTRODUCTION

2. A BRIEF BACKGROUND DESCRIPTION

Machine Learning (ML), Multi-Agent Systems (MAS) and Social Organization (SO) all attempt to transfer traits of human behavior to computing [1]. In this context, whenever more than two agents act autonomously in an environment, Social Learning is introduced as a learning technique of the agents [1][2]. Social learning research has been inspired by the ability of humans to learn from environments which are rich in people, interactions and unknown information [3].

This section is an introduction to the strategy board game used, to social organizations, and to simulating cooperative, competitive or mixed environment systems. We also present and analyze some of today’s widely used simulation systems.

2.1 Multi-Agent Social Simulation Platforms Computer social simulation began to be used widely in the 1990s; it was presented as way of modeling and understanding social processes [1]. Simulation introduces the possibility of a new way of thinking about social and economic processes, based on ideas about the emergence of complex behavior from relatively simple activities [2][15].

For a game agent, the social environment is represented by a game with all its components and entities [2][4]. Learning in a game is said to occur when an agent changes a strategy choice in response to new information, thus mimicking human behavior [1][3][4][5][6]. In previous work we presented a micro-social environment with competitive agents of a strategy board game [4]; therein, we used RLGame, a zero-sum game utilizing Reinforcement Learning to improve game-play as the tool of our research [7].

Quite a few Agent Based Modeling (ABM) tools are currently available, each one with distinct functionalities, graphical interfaces and programming languages. NetLogo, Jason, Repast and MASON, among others, are also used as MAS simulation tools, each one with its own characteristics and specific applications [1][11][17]. We briefly review these, below, but the scope of this work is not to survey them in detail.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. BCI’13, September 19-21, 2013, Thessaloniki, Greece. Copyright 2013 ACM 978-1-4503-1851-8/13/09 ...$15.00.

The MASON platform structure is based on modular layered architecture and was built from scratch [16]. It is developed on Java, with a utility layer consisting from general classes that can be used as base libraries. It then features a model layer, a small 66

collection of classes consisting of a discrete-event schedule, a high-quality random number generator, and a variety of fields which hold objects and associate them with locations. The visualization layer is the last one, providing a display view of fields and being also used as an administration system for the simulation [5][16]. This GUI toolkit, visualization layer, enables visualization and manipulation of the model in both 2D and 3D, and produces screenshots and movies [16]. NetLogo was first released in 1999 [1] and is also Java-based. It is developed on previous series of multi-agent modeling languages including StarLogo and StarLogoT. The GUI allows users to add dynamically manipulation components, obtain information about the objects, functions and many other properties of the platform and to edit programs controlling objects and components [11]. Its programming language is based on Logo. Repast initially appeared as a Java re-coding of Swarm, though it did not adopt all of Swarrn’s design and architecture [8]; today, it is implemented in several languages, such as Java, .NET and Python. It was built clearly to support domains of social sciences and includes specific tools for each domain. The architecture of this platform is based on six modules [8]; the most important of which is the Adaptive Behaviors Module, which provides the adaptability of the implementing agent behaviors. Users can buildin social models through GUI menus that can be managed with Python code. The GUI provides animated visualization of the simulation as well as provides the capability to take snapshots of the simulation and convert them into movies. Quite a few simulation models are based on Repast.

2.2 Games as Social Multi-Agent Systems



The level of global societies or (populations), where interests are mainly concentrated on the dynamics of a large number of agents, together with the general structure of the system and its evolution. At this level, anarchy is an important issue.

The base is considered as a single square, therefore a pawn can move out of the base to any adjacent free square. Players take turns and pawns move one at a time, with the white player moving first. A pawn can move vertically or horizontally to an adjacent free square, provided that the maximum distance from its base is not decreased (so, backward moves are not allowed). A pawn that cannot move is lost (more than one pawn may be lost in one move). A player also loses by running out of pawns. The leftmost board in Figure 1 demonstrates a legal (“tick”) and an illegal (“cross”) move for the pawn pointed to by the arrow, the illegal move being due to the rule that does not allow decreasing the distance from the home (black) base. The rightmost boards demonstrate the loss of pawns, with arrows showing pawn casualties. A “trapped” pawn automatically draws away from the game; so, when there is no free square next to the base, the rest of the pawns of the base are lost.

Two principal and most important categories apply as regards the interaction between agents [18][19]:

Competition: an agent can only win when some other agent loses (for example, a zero-sum game).

The level of groups, where we are interested in the intermediary structures which intervene in the composition of a more complete organization. In these organizations agents construct groups for a common goal. Also, at this level, interest is directed at the social choice of grouping and the hierarchical layers of agent groups [19]. Usually, grouping appears as instantiation of the cooperation extreme. On the other hand, sometimes we develop a cooperative environment from many sub-groups, as is the case in simulation of industrial manufacturing lines [11].

Figure 1. Examples of Game Rules Application.

more than two agents act autonomously and each one has its own information about the world and the other agents [2][19]. For a game agent, the social environment is a game with all its components and entities [3].





The RLGame board game [7] consists of two players and their pawns and is played on an n x n square board. Two a x a square bases are on opposite board corners; these are initially populated by β pawns for each player, with the white player starting off the lower left base and the black player starting off the upper right one. The goal for each player is to move a pawn into the opponent’s base or to force all opponent pawns out of the board (it is the player -not the pawn- who acts as an agent, in our scenario).

We usually consider Social Organizations as environments where

Cooperation: agents can share knowledge and utility functions, cooperating towards a specific goal (for example, RoboCup teams).

The micro-social level, where we are interested in the interactions between agents and the various forms of links between two or more agents, i.e. cooperation or competition links.

Our workbench, RLGame, was initially presented as a Competition extreme. It is a tool in studying multi agent systems via its tournament version, RLGTournament [4], implementing a RoundRobin scheme to pair participants against each other. RLGTournament fits the description both of an autonomous organization [2] and of a social environment [3][5].

Jason is also Java based and is developed as a social environment simulation platform, which implements the life-cycle of each autonomous agent [5]. As each agent corresponds to a thread, the number of agents running is limited by the characteristics of the JVM, practically limiting it to about 100 threads. The Jason platform is usually used as a social simulation platform and as an autonomous agent analysis tool [5]. It is considered as especially suitable for cognitive agents based on the BDI (Beliefs-DesiresIntentions) architecture. Also, the language that interprets Jason is an extension of AgentSpeak(L), an agent-oriented programming languages based on the BDI architecture.





A collection of artificial entities which communicate with each other and act in an environment forms an artificial organization (also referred to as a population, society, group, world, environment or universe). A key property is the size and complexity of the contents of the environment. Using the typology introduced in conventional sociology, one identifies three levels of organization in multi agent systems [2]:

Each player approximates its value function with a neural network [12]. Figure 2 demonstrates the neural network and the learning mechanism of each player. As input layer nodes we use the board positions for the next possible move, plus a flag on whether a pawn has entered an enemy base and plus some flags on whether the 67

number of pawns in the home base has exceeded some thresholds. The hidden layer consists of half as many hidden nodes. There is one output node; it stores the probability of winning when one starts from a specific game-board configuration and then makes a specific move.

interactive agents and a dynamic properties customization of the environments and the elements of its from human interferences. In our case study, the need of the large number of agents in different social environments was an important key point for new experiments.

The agent’s goal is to learn an optimal policy that will maximize the expected sum of rewards in a specific time, determining which action should be taken next given the current state of the environment. The policy to select between moves is an ε-greedy policy, with ε denoting the probability to select the best move (exploitation), according to present knowledge, and 1-ε denoting a random move (exploration). At the beginning all states have the same value except for the final states. After each move the values are updated through TD (0.5); temporal difference (TD) learning is a combination of Monte Carlo and dynamic programming ideas [12][13]. As a result, collective training of a neural network is effected by pitting a player against other players so that knowledge (experience) is accumulated.

Table 1. Characteristics of Some MABS Platforms MASON

Jason

Fair,

Poor.

NetLogo

Repast

User interface Programming Effort Execution Speed Dynamic Properties Log Creation Number Of Agents Adaptability Experiment Replay Availability (free) Legend:

Good,

A good starting point of MABS research is NetLogo. It has a relatively smooth learning curve for manipulating the system, combined with a good documentation and free tutorials. The wellstructured GUI provides for the powerful dynamic creation of components which can be administered via a simple programming language. The adaptability of this platform on existing environments is well supported with lots of documentation. On the other hand, the negative point of this platform, like in all of them, is the restriction of the data logs for the further analysis of the social environments. We believe that MASON is a well-supported platform for a user who is not interested in building a new environment, but simply wants to automate the analysis of some artificial intelligence techniques.

Figure 2. Learning mechanism of RLGame. Since the backbone of the agent knowledge is its neural network to which approximates its value function, different parameters for the neural network and the TD algorithm correspond to a variety of playing “characters” encompassing fast/slow learners, risky/conservative players, etc. Initial experiments demonstrated that, when trained with selfplaying, both players would converge to having nearly equal chances to win [7] and that self-playing would achieve weaker performance compared to a computer playing against a human player, even with limited human involvement [14]. Additionally, further experiments by implementing social organizations for RLGame suggested that socially trained agents are stronger than self-trained ones [4].

A very significant issue is, also, the adaptability of the characteristics of the agent; as in real world, every player may begin with a defensive mood and close the game with aggressive strategies, or vice versa. Dynamic interferences by a human player could also be a key point that would illustrate a range of playing behaviors for improving social learning.

3. ANALYSIS AND ARCHITECTURE The scope of this section is to briefly compare key aspects of the MABS tools, based on surveys that provide detailed analysis of their features [1][5][11][16][17]; Table 1 presents a summary. Note that all these platforms are available for free, having evolved from academic research projects.

Social science research does not close after the end of the experiments. As the analysis of the experiments is a most important part, we turned our development to the need of providing extensive logs, which could not be provided by other platforms.

For all the platforms, a significant disadvantage is the number of the agents. Each platform has specific limit of the agents for running in an environment, especially Jason. From this perspective, Jason can be used as a sufficient agent analysis tool on MAS environments. But, the adaptability of those platforms to new games or projects, especially strategy board games, in new environments, is a problem due to the need for difficult adjustments.

So, we decided to develop a new platform to fill the gaps of the existing platforms and to cover the needs for managing tournament data [4]. All in all, we were led to develop a new toolbox because we needed improved management of agents’ and environments’ features and support for analyzing large history files.

3.1 System Structure This system is developed in Java, on top of RLGame; initially thought of as a monitoring tool, it evolved to managing the

Table 1 suggests that there is no perfect simulation platform. Among the points hampering the use of those platforms are their lack of adaptability to series of games with large number of 68

attributes of the objects during the execution of the experiments and to set suitable attribute values for the training of the agents.

restriction of the computer hardware; for example, in our previous work [4], we conducted an experiment of micro-social environment, with a small number of agents interacting in a competitive environment. The simulation platform gives the ability to create new agent characters and also to add existing, by implementing a simple diversified exploitation-exploration tradeoff, dynamically for each player. Other agent characteristic variables are also available, such as the learning rate of the underlying neural networks. All the on-line controlled sub-objects are shown in Figure 4, by presenting a graph of the connections with the MABS platform layer and the Monitoring layer.

The system is separated in three modular layers, each of which consists of some objects and sub-objects. Layers do not directly communicate between them. Figure 3 illustrates the structure of the communication between layers and the objects.

The second object of Object layer is the game (currently we use only RLGame but an implementation of additional board games is forthcoming). The control of the attributes depends on the middle layer and can also be managed from the top layer during the experiment. The managed attributes are: board and base size, number of pawns and number of games per match. These values allow scaling of the learning experiments. In the next object, tournament scheduling, an implementation of a dynamic tournament scheduling class covers the needs of the creation of social environments (for example, Round Robin and Elimination). The sub-objects of this layer provide the ability to increase the complexity of socialization in environments, looking for better learning performance of the agent (as an example, consider the implementation of double scheduling algorithms with parallel running).

Figure 3. RLGame System Layers. At first, a bottom layer contains the principal objects of a social environment: agents, the game and the tournament scheduling systems. At the middle, there is the MABS platform, a system which routes the communication between the social environment (Objects layer) and the top layer (Monitoring layers). It processes all the attributes of a social environment before the beginning and during the execution of the experiment. Also, it analyzes the process and data of the experiments by presenting them on a GUI it appears to be the ruler of this system. The Monitoring layer creates a visual communication between the whole system and the user.

The last object of bottom layer, logs-database, is an additional tool that focuses on the creation of data logs of the performances, the histories, learning rates and many other characteristics of agents and games. Additionally, it produces the log files, describing each pawn movements of all agents, for the purpose of analysis with a playback tool. The ability to create various agent characters and to control them from a graphical interface helps the emergence of collective intelligence from the combination of the social environment and the interaction of the user with it.

All objects and their bidirectional interactions and communications between them and the MABS platform are presented in Figure 4.

Figure 4. System Objects: Communication and Interactions

3.2 System Base and Object Attributes The structure of the bottom layer is separated in four parallel objects, each one consisting of many sub-objects. In all cases the objects and the sub-object, interact with each other to produce some experience, especially social experience. The basic objects are: Agent, Game, Tournament Scheduling, Logs-Database.

Figure 5. Social Learning System Objects and sub-Objects. Figure 5 shows the directly controllable attributes of objects, programming classes, from the middle layer (MABS platform), as well as a simple example of the GUI and its top layer monitoring options. New tools can be added so that the system can be extended in many social learning multi-agent systems directions (for example, game playback, pawn trajectories comparison (as in the top left window in Figure 5), game comparison, learning

Beginning from the first object of the bottom layer, Agent, we can say that we can control unlimited number of diverse agent characters, depending on the need of the experiments and the 69

collective intelligence of agents. It is evident, then, that the key element of our approach is to have a platform where tools can be easily added to, depending on the needs of users who aim to analyze games with large numbers of players.

technique comparison, player efficiency - history analyzer, player ratings, among others. Simple GUI components are designed for the top layer, giving the user an easily manageable tool for carrying out experiments, with no need to program. Also, plotting tools are available for the ranking of each player.

5. REFERENCES [1]

Gilbert N. and Troitzsch K.G.. 2005. “Simulation for the Social Scientist”, Open University Press, 2nd ed., February.

[2]

Ferber J. 1999. “Multi-Agent Systems: An Introduction to Distributed Artificial Intelligence”, MA: Addison-Wesley.

[3]

Marivate V.N. 2008. “Social Learning methods in board game agents”, Computational Intelligence and Games, (CIG '08), IEEE Symposium (Perth, Australia. 2008), p.p. 323-328.

[4]

Kiourt C. and Kalles D. 2012. “Social Reinforcement Learning in Game Playing .”, IEEE International Conference on Tools with Artificial Intelligence (ICTAI.2012), (Athens, Greece, Nov. 7-9. 2012), pp 322-326.

[5]

Caballero A., Botia J. and Gomez-Skarmeta A. 2011. “Using cognitive agents in social simulations”, Engineering Applications of Artificial Intelligence, Volume 24 Issue 7, p.p. 1098-1109.

[6]

Al-Khateeb B. and Kendall G. 2011. “Introducing a Round Robin Tournament into Evolutionary Individual and Social Learning Checkers”, Developments in E-systems Engineering, Dubai, United Arab Emirates, December.

[7]

Kalles D. and Kanellopoulos P. 2001. “On Verifying Game Design and Playing Strategies using Reinforcement Learning”, Proceedings of ACM Symposium on Applied Computing, special track on Artificial Intelligence and Computation Logic, Las Vegas.

[8]

North M., Collier N., and Vos J. 2006. “Experiences creating three implementations of the repast agent modeling toolkit”. ACM Transactions on Modeling and Computer Simulation, Vol.16, Issue.1 (New York, USA, 2006), pp.1-25.

[9]

Thomaz A.L. and Cakmak M. 2009. “Social Learning Mechanisms for Robots”. International Symposium on Robotics Research (ISRR).

Figure 6. Graphical User Interface of the Platform As Figure 6 shows, three main panels are available. The first panel is the starter point of the execution of a simple experiment. It begins with the customization of the experiment by having the possibility to choose an existing tournament or to create a new tournament with its players and their default characteristics. Also it gives the ability to pause, continue, and stop the execution at any time. The second panel is split in two parts. In the right one we present the game in real time; the left one summarizes the tournament in tree mode, where elements can be expanded and collapsed at will, allowing the user to manipulate some properties on the fly (for example, the learning rate). The last panel is an output window providing information about any change in how an experiment is run (akin to a console).

[10] Cakmak M., DePalma N., Arriaga R.I. and Thomaz A.L. 2010. “Exploiting social partners in robot learning ”. Autonomous Robots, Volume 29, Issue 3-4, pp 309-329. [11] Barbosa J. and Leitao P. 2011. “Simulation of multi-agent manufacturing systems using Agent-Based Modelling platforms.”, 9th IEEE International Conference on Industrial Informatics (INDIN) (Portugal, Jul. 26-29, 2011), p.p.477 – 482.

4. DISCUSSION AND CONCLUSIONS Our research and development on Social Organizations and MultiAgent systems aims to analyze and understand the effects of Social Reinforcement Learning on game playing agents. The reason of developing a new MABS platform was to cover some of our requirements for tournament-based social organization and to overtake the difficulties of adapting and/or changing an existing platform. Our approach gives a user an innovating monitoring system by allowing one to dynamic amend key object attributes.

[12] Sutton R. and Barto A. 1998. “Reinforcement Learning - An Introduction”, MIT Press, Cambridge, MA. [13] Sutton R. 1988. “Learning to Predict by the Methods of Temporal Differences”, Machine Learning 3(1): p.p. 9-44. [14] Kalles D. and Ntoutsi E. 2002. “Interactive Verification of Game Design and Playing Strategies”, Proceedings of IEEE International Conference on Tools with Artificial Intelligence, Washington D.C.

The layered structure of this system allows for clean interaction between all layers, objects and sub-objects of the system. Additionally, it promotes the system’s flexibility and adaptability to accommodate new or external tools for subsequent development. The layered structure allows for the isolation of layers to also facilitate deployment in specialized settings; for example, one could opt to disable the top (visual monitoring) layer if the system is deployed in a grid infrastructure and, maybe, use a grid workflow portal instead.

[15] Simon H. A. 1996. “The sciences of the artificial”, MIT Press, Cmbridge, MA, London. [16] Sean L., Claudio C.R., Liviut P., Keith S., and Gabriel B. 2005. “MASON: A Multi-Agent Simulation Environment”. In Simulation: Transactions of the society for Modeling and Simulation International. 82(7): pp.517-527. [17] Bordini H.R. and Hubner J.F. 2009. “Agent-based simulation using BDI programming in Jason”. In W.D. Uhrmacher A.M., ed. MultiAgent Systems: Simulation and Applications. London, New York: CRC Press Inc, pp.451–476.

Several important issues remain to be addressed. We expect to implement more games in the system, to investigate its usefulness with simpler and more complex gaming environments. Also, by developing an agent ranking tool, we aim to develop player quality metrics across different populations of social environments. Additionally, a trajectory analysis tool could help avoid unnecessary actions of players. And, of course, to pursue our fundamental aim, we need a middle layer object to investigate the

[18] Poole D.L. and Mackworth A.K., 2010. “Artificial Inteligence: Foundations Of Computational Agents”, Cambridge University Press, USA, New York. [19] Shoham Y. and Leyton B.K. 2009.” Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations”, Cambridge University Press.

70

Suggest Documents