Governing Agent Interaction in Open Multi-Agent ...

Governing Agent Interaction in Open Multi-Agent Systems with Fault Tolerant Strategies Maíra Athanázio de C. Gatti¹, Rodrigo Paes¹, Gustavo Carvalho¹, Luis Rodrigues¹, Carlos J. P. de Lucena¹, Nora Faci², Jean-Pierre Briot³, and Zahia Guessoum³ ¹Software Engineering Laboratory, PUC-Rio Rio de Janeiro, Brazil {mgatti, rbp, guga, lfrodrigues, lucena}@les.inf.puc-rio.br http://wiki.les.inf.puc-rio.br/leski/bin/view/Main/Governance ²MODECOCReSTIC, IUT de Reims Reims, France [email protected] ³OASIS and SRC teams, LIP6, Universit´e Pierre et Marie Curie (Paris 6) Paris, France {Jean-Pierre.Briot, Zahia.Guessoum } @lip6.fr

Abstract.

Software systems are increasingly becoming distributed, open and ubiquitous assets. While open system components are often autonomous, they behave unpredictably when unforeseen situations arise. Taming this uncertainty is a key issue for dependable open software development. This work proposes law enforcement approach that uses fault tolerant strategies to develop dependable open systems. Despite all the research done in the last years on the development of fault-tolerant applications and of approaches to enforce secure interaction protocols in multi-agent systems, there is no mechanism which uses the law elements of interaction’s control which improves the agent criticality analysis. As we will see, this mechanism would increase dependability of those systems. Thus it is proposed a solution to achieve it through the integration of the DimaX framework, which is a framework for dynamic replication, and the XMLaw, which is composed by a declarative language for specifying the norms of the systems and a framework for implementing the law enforcement approach for regulating agents’ interaction.

1 Introduction Multi-Agent systems (MAS) are decentralized and self-organizing systems that consist of autonomous entities (agents) that solve tasks by cooperation [1]. The agents coordinate their actions in a decentralized manner by sending and receiving messages. The absence of centralized coordination data makes it hard to determine the current state of the systems and/or to predict the effects of actions. Moreover, agents are therefore faced with significant degrees of uncertainty for making decisions since it is very hard to devise all the possible situations that may arise in the execution context. Taming this uncertainty is a key issue for open software development. In such circumstances, reliability is a strategy for dealing with uncertainty associated with interactions in open systems [2]. However, in critical applications such as business environments or government agencies (hospitals, police, justice, etc.), the behavior of global system must be taken into account and structural characteristics of the domain have to be incorporated [3]. Concepts as organizational rules [4], norms and institutions [5][6], and social structures [7] arise from the idea that the effective engineering of MAS needs high-level, agent-independent concepts and abstractions that explicitly define the organization in which agents live [7]. These are the rules and global objectives that govern the activity of an enterprise, group, organization or nation. From the organizational point of view this creates a need to check conformance of the actual behavior of the society to the behavior desired by the organization [8]. Hence, these laws must be enforced

to delimit tolerated autonomous behavior and are also used to foster the development of trusted systems. The bottom line is that laws are used to represent the valid interactions in open MAS applications [2]. Due to the need of enforcing the laws through the agents’ interaction, some works have appeared in the last years to propose mechanisms for regulating agent’s interactions in complex scenarios. Here we are going to use the approach proposed by [9] through XMLaw description language, which allows for the runtime enforcement of agents compliance based on a law-governed mechanism. Moreover, the dependability of a computing system is its ability to deliver service that can justifiably be trusted. Correct service is delivered when the service implements the system function. Considering that it is very hard to determine the state of autonomous systems due to the absence of centralized coordination data [11], there is the need to ascertain open multi-agent systems dependability, that is a justified trust that they will satisfactory perform their missions and not cause any catastrophes [10]. Fault tolerance is said to be a useful technique to increase the reliability of a distributed system. Considering that an Open MAS is a special type of such systems, there is a need to study the particular characteristics of these Open MAS in order to reach higher degrees of dependability. The Agent Replication Technique, which is the act of creating one or more duplicates of one or more agents in a multi-agent system, seems to be an efficient way to achieve fault tolerance in an Open Multiagent System, since fault tolerance is generally implemented by error detection and subsequent system recovery. Hence we can increase the dependability of an Open MAS just by implementing the Agent Replication technique. It is also necessary that this approach should be dynamic and adaptative since this kind of Open MAS are very dynamics [12]. In this work, we choose to use the Dima X framework, which is a framework for dynamic replication of agents that ensure dependability as transparent as possible to the application. It focuses on scalability and adaptive for large-scale multi-agent systems. If our main goal is to arise as much as possible the dependability of multi-agent systems in order to guarantee a high degree of reliability of that kind of systems, a platform that integrate these two approaches (dependability achieved by fault tolerance and by governance of agents’ interaction) would be an efficient way of doing it. So, we can have a platform with high degree of dependability and a law-governed approach to improve the criticality analysis , which defines the number of agent's replicas, and at the same time we could provide the ability of monitoring and enforcing the behavior of agents through the enforcement of laws.

Outline This paper is organized as follow: section 2 and 3 describes the context of this work. Section 2 describes the law enforcement mechanism works during the agents’ interactions and section 3 describes the types of failures that may occur in a MAS, the ones that this work addresses and the framework that we used in the integration which implements fault tolerance. Section 4 presents the problem description and section describes then the architecture that we developed to achieve the proposed solution for solving the problem description.

2 Law-Governed Interaction As we have already seen, distributed software agents are independently implemented, i.e., the development takes place without a centralized control. In order to achieve a coherent system, we assume that every agent developer may have an a priori access to the open system specification, including the protocol description and the interaction laws. In this scenario, law-governed architectures can be designed to guarantee that the specifications will be obeyed. In this kind of architecture, a mediator is needed to intercept messages and interpret the laws that were previously described. Besides monitoring the conversation of agents, it is necessary to specify the interaction rules that will govern this mediation. In this section, we explain the description language XMLaw [9]. XMLaw is used to represent the interaction rules of an open system specification. Those rules are interpreted by a mechanism that at runtime is used to analyze the compliance of open MAS with interaction laws

2.1 XMLaw Overview The model presented in this section is used to represent interactions in an open distributed environment. Basically, interactions should be analyzed, and after that, described using the concepts proposed in the model. Then, the concepts are mapped to a declarative language, called XMLaw. Interaction’s definitions are interpreted by a software framework that monitors components’ interaction and enforces the behavior specified on the language. Once interaction is specified and enforced, despite the autonomy of the agents , the system’s global behavior is better controlled and predicted. Interaction specification of a system is also called laws of a system. This is because besides the idea of specification itself, interactions are monitored and enforced. Then, they act as laws in the sense that describe what can be done, what cannot be done and what should be done. This model provides computational concepts that allow specify interaction laws in multi-agent systems. Most of this model was already reported in [25] and therefore the concepts are very briefly introduced in this paper to make it self-contained. The outer concept of this model is the LawOrganization. We can see this element as representing the interaction laws (or normative dimension) of a multi-agent organization. A LawOrganization is composed of scenes, clocks, norms and actions. Scenes are interaction contexts that can happen in an organization. They allow modularizing interaction breaking the interaction of the whole system in smaller parts. Clocks introduce global times which are shared by all scenes. Norms capture notions of permissions, obligations and prohibitions regarding agents’ interaction behavior. Actions can be viewed as a consequence of any interaction condition, for example, if an agent acquires an obligation, then the action ‘A’ should be executed. Scenes define an interaction protocol (from a global point of view), a set of norms and clocks that are only valid in the context of the scene. Furthermore, scenes also identify which agents are allowed to start or participate of the scene. Non-deterministic automatons, enhanced by the constraint element and by the possibility of integration with the other elements from the model, represent interaction protocols. Constraints are used as conditional firing of protocol transitions allowing to check complex conditions based on the message contents. To achieve lower coupling levels among these elements, there is also a dynamic model based on events that reify the relationships of the conceptual model in a flexible way. Events are the basis of the communication among law elements, that is, law elements dynamically relate with other elements through event notifications. Basically, we can understand the dynamic of the elements as a chain of causes and consequences, where an event can activate a law element; this law element could generate other events and so on. For example, the arrival of a message generates an event (message arrival), this event may activate a transition (transition activation), transition in its turn may activate a clock (clock activation), which generates a (clock tick) event, and lastly it activates a norm (norm activation). Figure 1 shows this chain of causes and consequences, and Figure 2 summarizes all law elements that can generate and sense events.

Fig. 1. Chain of Events

Fig. 2. Law Elements that generates/senses events

A framework was developed to support the development of open distributed systems providing compliance with both the model of interactions proposed previously and the declarative language XMLaw. The fra mework has a set of modules that supports three types of users: (i) “Law developer” represents the developer responsible for specifying the laws, he must understand the application under construction, know the law con-cepts, and then, specify the laws for the application; (ii) “Agent developer” represents the developer responsible for building the agents of a multi-agent system, those developers know about the existence of the laws and should design the agents in compliance with them; (iii) “Software infrastructure developer” deals with law enforcement soft-ware support. Sometimes there is no software infrastructure that implements the law enforcement mechanism or the existent infrastructures are not suitable for the systems that are under development. In these cases, software infrastructure developers should modify the existent infrastructure, or even construct a new one. The framework provides support for each one of those developers. First, it is provided a declarative language, the XMLaw, for law developers, which allows the specification and maintenance of the interactions of multi-agent systems. XMLaw specifications are interpreted and enforced by the law enforcement framework. The framework provides this software support, which contains a number of hotspots that can be extended by software infrastructure developers. Lastly, agent developers are provided with classes and interfaces that support building agents integrated to the law enforcement software. Most of the framework is implemented as mediator agent modules. The mediator agent monitors all interactions and makes sure that interactions are compliant with the specifications. The mediator performs a number of activities. First, the mediator waits for receiving messages. Once a message arrived, it checks if the message belongs to the mediator protocol. If it does, the mediator proceeds with the protocol execution. Otherwise, if the message belongs to some agent conversation, the mediator starts the process of enforcing, and if the laws allow, the message is redirected to the addressee agent. This sequence of activities is repeated while the mediator agent is running. The proposal here is not to detail the framework, so more details can be found in [26]. Next sections will address both Dima X and XMLaw, so it is very important that all the concepts are restrained so a better exploitation will be done.

3 Fault Tolerance for Multi-Agent Systems There are four essential characteristics of a multi-agent system: a MAS is composed of autonomous software agents, a MAS has no single point of control, a MAS interacts with a dynamic environment, and the agents within a MAS are social (agents communicate and interact with each other and may form relationships). All these situations contribute to a failure state. A failu re occurs when the system produces results that do not meet the speci?ed requirements. A fault is de?ned to be a defect within a component of a MAS that may lead to a failure. There are several faults that may occur. For instance, we can have program bugs, which are errors in programming that are not detected by system testing. We can also have unforeseen states, i.e, the programming does not handle a particular state and testing team did not test for this state. We

can have processor faults, which can be a system crash or a shortage of system resources. There would be communication faults which can occur due to slow downs, failures or other proble ms with the communication links; And, finally, unwanted emerging behaviour, i. e, system behaviour which is not predicted. Emerging behaviour may be bene?cial or detrimental. When a fault occurs in a MAS, interactions between agents may cause the fault to spread throughout the system in unpredictable ways. Several approaches ([17], [18], [19], [20], [21]) address the multi-faced problem of fault tolerance in multi-agent systems. These approaches can be classified in two main categories. The first category focuses especially on the reliability of an agent within a multi-agent system. This approach handles the serious problems of communication, interaction and coordination of agents with the other agents of the system [12]. The second category addresses the difficulties of making reliable mobile agents which are more exposed to security problems [16]. This second category is beyond the scope of this work. The main limit of current replication techniques is that most of them are not quite suitable for implementing adaptive replication mechanisms because in most cases, replication is decided by the programmer and applied statically, before the application starts. This works fine because the criticality of components (e.g., main servers) may be well identified and remains stable during the application session. Opposite to that, in the case of dynamic and adaptive multi-agent applications, the criticality of agents may evolve dynamically during the course of computation. Moreover, the available resources are often limited. Thus, simultaneous replication of all the components of a large-scale system is not feasible [13]. Considering that limitation, a specific and novel framework for replication, named DimaX (see details en next section), which allows dynamic replication and dynamic adaptation of the replication policy (e.g., passive to active, changing the number of replicas) was designed. It was designed to easily integrate various agent architectures, and the mechanisms that ensure dependability are kept as transparent as possible to the application. Basically, there are several approaches to fault tolerance in MASs that can be classified in two groups to know: agent-centric approaches, which build fault tolerance into the agents; and system-centric approaches, which move the monitoring and fault recovering into a separate software entity. Agent replication technique uses aspects of both agent-centric and sys -tem-centric approaches. Each agent must have the capability to utilize replication [22]. The replication technique is already old to the people who work with fault tolerance in distributed systems. Then the agent replication would be the act of creating one or more replicas of one or more agents. The replication degree, or number of each agent replica, will be the criticality degree of it and the how much complex it is to add or remove members in the existent group. There are two main types of replication protocol: active and passive protocols. The first one occurs when all replicas process concurrently all input messages. And the second one occurs when only one of the replicas processes all input messages and periodically transmits its current state to the other replicas in order to maintain consistency [12]. The two replication strategies (active and passive) can be used to replicate agents. Active replication provides a fast recovery delay. So, it is dedicated to applications with re al-time constraints. Moreover, passive replication provides a low overhead under failure but it does not provide short recovery delays. So, the choice of the most suitable strategy relies on the environment context. Active replication must be chosen when the failure rate becomes too high or when the application has real-time constraints. Otherwise, passive replication is most suitable. In most multi-agent applications, the environment context is very dynamic. So, the choice of the replication strategy of each component, which relies on a part of this environment, must be determined dynamically and adapted to the environment changes. Moreover, a multi-agent system component which can be very critical at a moment can loose its criticality later. If we consider the replication cost which is very high, the number of replicas of these components must be therefore dynamically updated. Next sub-section present an overview of DimaX framework and how it contributes to our work.

3.1 DimaX Overview DimaX is the integration between DIMA[29] and DarX[30] [31]. DIMA and DarX have been integrated to build a fault-tolerant multiagent platform. DimaX provides multi-agent systems with several services such as distribution, replication, and naming service [30]. In order to bene?t from fault tolerance mechanisms,

the agent behavior is wrapped in a task of the DarX framework (see Figure 3). Moreover, for a dynamic control of replication, the monitoring architecture has been introduced. Figure 3 gives an overview of DimaX.

Fig. 3. DimaX's Overview

Consider a distributed system consisting of a ?nite set of agents Ai = {A1,A2,. . .,An} that are spread through a network. These agents communicate only by sending and receiving messages. DarX provides global naming, and each agent has a global name which is independent of the current location of its replicas. The underlying system allows handling the agent’s execution and communication. The failure of a machine or a connection often involves the failure of the associated DarX server. However, in this solution the fault tolerance protocols are agent-dependent, and not place-dependent, i.e. the mechanisms built for providing the continuity of the computation are integrated in the replication groups, and not in the servers. For instance, the monitoring agents are built as active components associated to the domain agents. Moreover, DarX provided a fault-detection mechanism. A machine crash - server failure - is handled in three steps within every replication group: − detection of a n eventual failure within the group, − evaluation of the context: new criticality, leader failure, ... − recovery: If the missing replica was the group leader, a new one is elected and an agent monitor is automatically activated. In the other case, it depends on the evaluation; a new follower/backup may or may not be instantiated. Obviously, if a leader without any follower/backup fails, then it will not be recovered. This derives from the original assumptions made: the criticality of an agent evolves during the computation, and there are phases when an agent need not be fault-tolerant. To validate DimaX, several series of experiments were realized. Based on that, we opted for this framework to build our solution. Further details about DimaX implementation and the experiments can be found in [13], [23].

4 The Problem Description To illustrate a problem about how to dynamically calculate the agent criticality considering the organizational structures, we will describe a negotiation scene based on FIPA-CONTRACT-NET [27], which is represented in figure 4. The goal is to discover how the elements of the XMLaw could improve the agent criticality analysis that is done b y DimaX.

Fig. 4. FIPA CONTRACT-NET PROTOCOL

Basically, the Initiator solicits m proposals from other agents by issuing a call for proposals (cfp) act, which specifies the task. Participants receiving the call for proposals are viewed as potential contractors and are able to generate n responses. Of these, j are proposals to perform the task, specified as propose acts. The Participant’s proposal includes the preconditions that the Participant is setting out for the task, which may be the price, time when the task will be done, etc. Alternatively, the i=n-j Participants may refuse to propose. Once the deadline passes, the Initiator evaluates the received j proposals and selects agents to perform the task; one, several or no agents may be chosen. The l agents of the selected proposal(s) will be sent an accept-proposal act and the remaining k agents will receive a reject-proposal act. The proposals are binding on the Partic ipant, so that once the Initiator accepts the proposal, the Participant acquires a commitment to perform the task. Once the Participant has completed the task, it sends a completion message to the Initiator in the form of an inform-done or a more explanatory version in the form of an inform-result. However, if the Participant fails to complete the task, a failure message is sent. Now, suppose that this protocol was described as a state machine (figure 5) where Si are the protocol’s states during its execution and the clocks’ representation are the clocks activation and deactivation for each + or -, respectively. The protocol starts with the state S0 when the Initiator solicits m proposals from other agents and it ends with the states S4, or S5, or S7, it depends on the protocol’s flow.

Fig. 5. Contract -Net State Machine

Now, considering this representation we are going to map another negotiation scene in other to check how the criticality of the agent that is executing the protocol vary. Then suppose the following situation: a customer starts a negotiation sending a proposal for a book to a seller. He informs the maximum price that he will pay for the book. The seller can accept with a proposal or can refuse it. If he accepts, he can send proposals with lesser or equal price informed by the customer. When the customer receives the proposal, he has 20 seconds to decide if he will accept it or not. After 20 seconds, if the customer hasn’t answered the seller, he can sell the product to another customer. Otherwise the seller is not allowed to sell it to anybody else. If the customer refuses it, the seller can re-propose another price. If the customer accepts it, the seller informs the bank where the payment must be done. Then the customer has the obligation of paying for the product and of informing the number of the voucher to the seller. The scene ends then when the customer informs that paid it with the voucher number. Figure 6 shows the state machines of the negotiation protocol and all XMLaw ele ments (clock and norm) activated during its execution.

Fig. 6. Negotiation Scene Description

To summarize the protocol of the negotiation Scene Description, we have: • From state S0 to S1: the customer starts a negotiation sending a proposal for a book to a seller (cfp message). • From state S1 to S2: the seller accepts the customer’s proposal. He sends proposals with lesser or equal price informed by the customer (propose message). This transition activates the clock and the customer has 20 seconds to answer the seller.

• • • •

From state S1 to S6: the seller refuses the customer’s proposal (refuse message) and the protocol ends. From state S2 to S3: the customer accepts the seller’s proposal before the 20 seconds (accept message). From state S2 to S7: the customer refuses the seller’s proposal before the 20 seconds (refuse message). From state S7 to S2: the seller proposes the customer again with another price (propose message) and the clock is activated again. • From state S2 to S8: the customer doesn’t answer the seller and the seller informs the customer that he can offer the book to another customer (inform message) and the protocols ends. • From state S3 to S4: the seller informs the customer the bank where he has to pay for the book (inform message) and he has the obligation to pay in order to receive the book (norm activation). • From state S4 to S5: the customer informs the seller the voucher (inform message) and has the permission the receive the book. The protocol ends. If we consider that when an event (as clock activation/deactivation, norm activation/deactivation, etc.) occurs during the scene execution the agent critilicality can increase or decrease, since the agent becomes more or less important, each element should be analyzed in order to calculate it in the best way. Moreover, another elements and events that might not be handled by XMLaw should be analyzed in order to evaluate how it could influence the agent criticality analysis. For example, when an agent starts playing a role its criticality may increase or decrease.

Fig. 7. Criticality Analysis

In the context of the negotiation scene, when the customer have to answer the seller if he will accept his proposal or refuse it, his criticality may increase, since that agent became very important to the seller and should not crush, for exa mple (it is important to point out that this criteria depends on the domain analysis). So, just to make it more clear, if we could classify the criticality of the agent in three levels, we would have a low (L), medium (M) or high (H) criticality level during the scene execution. Figure 7 shows how it could vary in a time period. Note that we replaced the status representation (s0, s1, …, s9) by the criticality level representation (L, M or H). From s1 to s2 states the customer criticality level increased from low to medium since the clock activation event was fired. From s2 to s8 or s7 states it decreased since the clock deactivation event was fired. From s2 to s3 states it increased even more going from medium to high since the customer has the obligation of paying for the product. Now, if we analyze the seller criticality during the scene execution, we can perceive that from s1 to s2 states its criticality level increased since the customer has called him for a proposal, and then he has to answer him. The main questions which arise from that analysis are:

• How and which elements of the XMLaw could improve the agent criticality analysis that is done by DimaX? • How can we do it in the best way considering coupling, modularity and reuse? Next sections will describe the expected contributions as a consequence of answering those questions and how to solve it.

5 The Proposed Solution In order to integrate the XMLaw and DimaX it was necessary to extend both of them. First we analyzed XMLaw and studied which elements should be inserted, wh ich events should be sensed by the new elements and so on. It was not a trivial work considering that XMLaw is not an extensible framework for adding new elements or generating new events. Until now, it is easy to instantiate it but not to extend it. After that, we analyzed how to integrate it with Dima X and how to extend the criticality analysis done by DimaX. In this section we will describe the proposed solution first from the XMLaw point of view, and second from the DimaX point of vie w.

5.1 XMLaw Extensions The XMLaw conceptual model is composed by concepts which allow representing agents’ interaction aspects. Among those concepts there are scenes, which group the interactions in modules; interaction protocols which define precisely the communication between agents; norms which hold the obligation, permission and prohibition concepts; clocks which add a temporal aspect into interactions allowing that a specific behavior may be valid during a period of time; and actions which allow executing automatic recovery code in the system in a fault case. We have extended XMLaw with two new elements: Role and Criticality Analysis (figure 8). XMLaw didn’t have the concept of Role. The roles played by agents had to be informed by themselves when they ask to enter in an organization and it was associated to a new name of the agent inside the organization.

Fig. 8. XMLaw: New Conceptual Model

With this new element (Role), when an agent asks to enter in an organization, it has to inform the role it wants to play and when a scene is executed, the agent, if accepted, will have to play its role. An

organization has one or more roles to be played by agents and an agent can plays different roles in different organizations. The parser will read the law specification to generate the descriptors which are instantiated by the model level (figure 9). Then the mediator interacts with the model level in order to execute the law enforcement mechanism.

Fig. 9. XMLaw Framework execution

Each organization’s role has identification and a list of norms associated to its Rights. Some norms can be activated and/or deactivated according to agents’ right. Rights describe the permissions on the resources and services available in the environment and about the behavior of the agents. It can activate or deactivate some norms related to that role. The criticality analysis element has two elements: Increases and Decreases. The element Increases has the list of events that contribute for the increase of the agent criticality. And the element Decreases has the list of events that contribute for the decrease of the agent criticality. Each element Increase and Decrease has as attributes: the event identification from the event that was fired, the event type from the event that was fired, the value which is a weight for the increase or decrease contribution of that event and the agent role identification which has all the references to the agent that will have its criticality updated in runtime. For the problem described in the fourth section, for example, the XMLaw specification for the Criticality Analysis is showed in figure 10. When an agent starts playing the customer role, its criticality has to be recalculated and updated by a weight of 0.3. The same happens when an agent starts playing the seller role, its criticality has to be updated by a weight of 0.7. Those actions are executed when the role activation event is fired.

Governing Agent Interaction in Open Multi-Agent ...

Governing Agent Interaction in Open Multi-Agent ...

Suggest Documents