Learning Negotiation Approach Selection with SVMs - CiteSeerX

0 downloads 0 Views 26KB Size Report
Abstract. In a multi-agent system, autonomous agents may negoti- ate with each other over coordination for task fulfillment. An agent may select different ...
Learning Negotiation Approach Selection with SVMs Xin Li, Qingping Tao, Leen-Kiat Soh Department of Computer Science and Engineering University of Nebraska-Lincoln 115 Ferguson Hall, Lincoln, NE, USA 68588-0115 {xinli, qtao, lksoh}@cse.unl.edu

Abstract In a multi-agent system, autonomous agents may negotiate with each other over coordination for task fulfillment. An agent may select different approaches to negotiate with individual partners in different negotiation processes. We propose a learning method based on support vector machines (SVMs) to conduct adaptive negotiation approach selection in dynamic and uncertain environments.

1. Introduction Support vector machines (SVMs), a new generation learning system based on recent advances in statistical learning theory [1, 4], have been well founded theoretically and been showing significant generalization performance in practice. In this paper, we focus on using SVMs in negotiation approach selection in a multi-agent system. We have designed and implemented an adaptive, confidence-based negotiation strategy for negotiations among agents in dynamic, uncertain, real-time, and noisy environments [3]. In each individual negotiation, a negotiation-initiating agent may employ a pipelined or a packaged negotiation approach for a negotiation-responding agent based on the confidence it has in the agent. The computation of confidence value is based on a set of characteristics and their corresponding weights. To improve the accuracy of confidence value computation and address the dynamic and uncertain characteristics of the negotiation environment, we use SVMs to dynamically learn the adaptive weight setting.

2. Related Work In a multi-agent system, some agents may form coalitions to perform global tasks. In our previous work [2], when an agent encounters a task requiring coalition formation, it first selects candidates from its peer agents, and then conducts a group of 1-to-1 negotiations concurrently with the candidates. Each individual 1-to-1 negotiation is composed of one or more negotiation jobs and each nego-

tiation job is to take the tactical negotiation actions on specific subtasks. We have designed and implemented an adaptive, confidence-based negotiation strategy to address the characteristics of the negotiation environment [3]. In each individual negotiation, a negotiation-initiating agent may employ a pipelined negotiation approach, or a packaged negotiation approach. In the pipelined approach, the initiating agent negotiates only one subtask in a negotiation job. As the job is completed, the agent subsequently negotiates other subtasks in succeeding jobs. If the peer cannot negotiate as expected, the remaining jobs will be moved from the current pipeline to another peer’s pipeline. In the packaged approach, the initiating agent packages multiple subtasks into one negotiation job, which is the only job in the individual negotiation. Our goal is to integrate these two negotiation approaches, that is, an initiating agent is able to adaptively select different approaches to improve the request satisfaction and the cost effectiveness. Then it can benefit from the advantages of both approaches. Our strategy of selecting a negotiation approach for a peer is based on the confidence that the negotiationinitiating agent has in that peer. Specifically, an agent measures its confidence in a peer based on how consistent that peer’s negotiation or coalition behavior is. The peer’s behavior is characterized and recorded in the neighborhood profile that the agent maintains. The agent computes the confidence value in each peer based on the standard deviations of the relevant characteristics profiled (see Section 3). A characteristic value with a small standard deviation means that the peer exhibits consistency in this particular characteristic. In our previous work, the confidence value of the initiating agent Ai in a peer agent A j ’s kth characteristic

C Ak j is computed in the following formula: 1

C Ak

ConfidenceAi j = 1+

l i =1

(C Ak j ,ti − C Ak j ,average ) 2

where

C Ak j , t i is the perceived value of C Ak j at time ti

(i ∈ [1,l]) and

C Ak j , average is the average value of

where Confidence AA j is the (composite) confidence value i of agent Ai in agent A j , Confidence CA

k Aj

is the confidence

i

C Ak j and wk is its

C Ak j , t i during the certain time period.

value of Ai in A j ’s kth characteristic

The (composite) confidence value of the initiating agent in a peer agent is simply a weighted sum of the confidence values of the peer’s characteristics. For the specified threshold θ 1 , if the (composite) confidence value is greater than θ1 , the packaged approach is selected; otherwise, the pipelined approach is selected. The characteristics generally change while their corresponding weight values are fixed.

weight,

3. Methodology

4. Experiments

To compute the confidence value of an agent in a peer agent, we can simply use a group of fixed weight values always. But they may be inaccurate in evaluating the contributions of different characteristics to the computation of the confidence value. To improve the accuracy of confidence value computation and address the dynamism and uncertainty of the negotiation environment, we use SVMs to learn the weight setting. Then the weight value of each characteristic can be dynamically set and be adapted to different coalition formation processes. Also with kernel tricks, SVMs can implicitly consider very complex feature spaces just based on the basic features, which can save us from extensive feature engineering. As a pure classification learner, the SVM needs a training set. We use the feature values obtained from the negotiation-initiating agent’s past negotiation and coalition experience as training examples. Here we distinguish the features used by SVMs from the characteristics of an agent. An agent’s characteristics profiled include: (1) the satisfaction degree of requests to the peer, (2) the satisfaction degree of requests from the peer, (3) the reliance degree of the agent on the peer, (4) the tardiness degree indicating the communication delay between agents, and (5) the hesitation degree indicating how readily the peer is to agree to a negotiation job. The features used as training examples of the SVM are the confidence of the negotiation-initiating agent in the above characteristics of the peer agent, for example, the confidence in the tardiness degree of the peer agent. To address the task factor, we also consider the task’s time-critical requirement and importance degree as specific features. Then the (composite) confidence value of an agent in a peer agent can be formalized as follows:

To verify the advantages of the method, we design three experiments. Each experiment uses the same group of tasks. The first experiment uses the no-learning weight setting in which the pre-fixed weights are used. The second experiment uses the offline-learning weight setting in which the negotiation outcomes of all tasks in the nolearning experiment are used as training examples. We train the SVM on these examples, get a set of new weights and then use them for the same group of tasks. The third one uses the online-learning weight setting in which the new weight values are obtained after each specific number of tasks are completed, and immediately used for the succeeding tasks. Then the learning frequency in the third experiment is higher than that in the second version. Presently, we have finished the first experiment. According to the preliminary experimental result, it is obvious that different negotiation approaches were alternatively selected by agents for peers as the time progressed. We will finish the other two experiments thereafter to compare the negotiation outcome and cost among all the experiments. We hypothesize that the offline-learning version can select more appropriate negotiation approaches than no-learning but less appropriate ones than online-learning.

C Ak

Confidence Aij =

wk Confidence Ai j +

A

k

wl Fl l

Fl is a task feature and wl is its weight.

For a group of feature values, there is a learning reward µ that is the outcome resulted from selecting a specific negotiation approach. For the specified threshold θ 2 , if µ > θ 2 , it indicates that the selected approach is appropriate; otherwise, another approach should be selected. Then we train the SVM on these examples and get a set of new weights.

References [1] Schölkopf, B. and Smola A. 2002. Learning with Kernels. Cambridge, MA: MIT Press. [2] Soh, L.-K. and Li, X. 2003. An Integrated Multi-Level Learning Approach to Multiagent Coalition Formation. In Proc. IJCAI’03, pages 619-624, Acapulco, Mexico. [3] Soh, L.-K. and Li, X. 2004. Adaptive, Confidence-based Multiagent Negotiation Strategy. To appear in Proc. AAMAS’04, New York. [4] Vapnik, V. 1998. Statistical Learning Theory. N.Y.: John Wiley.

Suggest Documents