Design of Negotiation Agents based on Behavior Models Kivanc Ozonat and Sharad Singhal Hewlett-Packard Labs, Palo Alto, CA USA
[email protected],
[email protected]
Abstract. Despite the widespread adoption of e-commerce and online purchasing by consumers over the last two decades, automated software agents that can negotiate the issues of an e-commerce transaction with consumers still do not exist. A major challenge in designing automated agents lies in the ability to predict the consumer’s behavior adequately throughout the negotiation process. We employ switching linear dynamical systems (SLDS) within a minimum description length framework to predict the consumer’s behavior. Based on the SLDS prediction model, we design software agents that negotiate e-commerce transactions with consumers on behalf of online merchants. We evaluate the agents through simulations of typical negotiation behavior models discussed in the negotiation literature and actual buyer behavior from an agent-human negotiation experiment.
1
Introduction
Negotiation is the process of reaching an agreement between two (or more) parties on the issues underlying a transaction. Negotiations are an integral part of business life, and are instrumental in reaching agreements among the parties. Over the last two decades, electronic commerce has become widely adopted, providing consumers with the ability to purchase products and services from businesses online. Electronic commerce offers many advantages, including increased operational efficiency for the businesses, reduction of the inventory costs, and availability of the product and service 24 hours a day. Yet, after two decades, electronic commerce systems still lack the ability to enable the businesses to negotiate with consumers during the purchases. There is a need for automated, online software agents that can negotiate with consumers on behalf of businesses. Designing automated agents for negotiation has been explored across multiple disciplines, in artificial intelligence [5, 6, 7], human psychology [16, 17, 18, 21] and statistical learning [3, 9]. One form of negotiation that has received much attention in these disciplines is the bilateral sequential negotiation. A bilateral sequential negotiation typically starts with one party (e.g., the buyer ) making an offer on each of the negotiated issues. The issues can include, for instance, the price of a product or service, its delivery time and its quantity. The opposing party (e.g., the seller ) has three options: (i) it accepts the offer, (ii) it rejects the offer and makes a counter offer on each issue, or (iii) it rejects the offer and
2
terminates the negotiation. The process continues in rounds, where, the buyer and the seller make and respond to counter offers. The artificial intelligence community has focused on finding game-theoretic solutions to the bilateral sequential negotiation problem. Each of the two parties is assumed to have a utility function of the negotiated issues. The utility function measures how preferable an offer is to the party. In the game-theoretic model, each party is assumed to have the goal of maximizing its utility function, and the aim is to find an agreement that is Pareto-optimal, i.e., an agreement where the utility of one party cannot be increased any further without decreasing the utility of the other [6]. Many game-theoretic models of negotiation assume that each party knows the opposing party’s utility function [6], while more realistic models employ statistical algorithms that learn the opposing party’s utility function [3, 5, 9]. The learning can be based, for instance, on a training set of issue offers from the parties. In either case, a fundamental problem, when one of the parties is a human, is that people rarely construct mathematical functions, such as utility functions, to quantify their issue preferences in a negotiation. Moreover, humans are not “super-rational beings”; during a negotiation, it is unlikely that a person will always give the most rational decisions in terms of maximizing a mathematical function. We take an alternative approach, and aim to predict the behavior of a human buyer in the issue space instead of estimating a utility function. In [5], the human buyer’s negotiation behavior has been modeled directly in the issue space rather than in the utility space. Multiple sets of parameterized functions, which imitate buyers’ behaviors in the issue space, are used to construct trajectories that represent the buyer’s movement in the issue space from its initial offer to the final (or target) offer. A typical trajectory consists of a sequence of these parametrized functions such that the parameters of the functions, the positions of the functions in the sequence, and the time of switching from one function to another in the sequence are determined based on prior probabilities. We devise and employ statistical learning-based prediction techniques based on the behavior models developed in [5]. Our goal is to predict the future trajectories of the buyer in the issue space based on the history of the buyer’s offers. Linear dynamical systems have been used in literature to predict trajectories in fields as diverse as communications, aeronautics, finance, and navigation [11]. We employ an LDS-based prediction model to predict the buyer’s future trajectory. The models in [5] consist of sequences of alternating and changing functions, suggesting a prediction model that accounts for switching behavior. We therefore approach the prediction problem through switching linear dynamical systems (SLDSs). A challenge is that one cannot assume a prior knowledge of the SLDS model size. To address this challenge, we extend the SLDS learning to a setting that allows the model size grow with new offers. We then design agents based on the SLDS prediction model, and evaluate them using two sets of experiments: The first set tests the agents against trajectories generated according to the
3
models in [5], and the second set tests them against actual buyer behavior from an agent-human negotiation experiment.
2
Negotiation Behavior of Buyers
A (bilateral sequential) negotiation occurs between two parties, the buyer and the seller of some product or service. The two parties negotiate over D issues such as the price of the product, the delivery time and the product quantity. At each negotiation round n, n ≥ 1, the buyer makes an offer, yn , to the seller. The offer yn is a D-dimensional real-valued vector with its dth element yn,d denoting the buyer’s offer for the dth issue. The seller responds with one of three options: (i) continue the negotiation with a counter offer, which, analogous to yn , is also a D-dimensional vector, (ii) terminate the negotiation by accepting yn (i.e., a deal is reached), (iii) terminate the negotiation by rejecting yn (i.e., no deal reached). We assume that the seller is a software agent, and the buyer is a human. The buyer has a set of targets that it does not disclose to the seller. A target is an offer that the buyer aspires the seller would accept by the termination of the negotiation. Often, the buyer has multiple targets. For instance, in a scenario with 2 negotiated issues, price and delivery time of a product, the buyer may have targets of {240 dollars and 2 weeks}, {220 dollars and 3 weeks}, and {160 dollars and 4 weeks}. The buyer’s negotiation behavior has been modeled in [5]. According to the models discussed in [5], the buyer rarely offers one of its targets to the seller in the initial round, i.e., at n = 1. Instead, at n = 1, it makes an offer that is more advantageous to itself than its targets. For instance, a buyer with a target of {240 dollars and 2 weeks} may offer only {200 dollars and 1 week} at n = 1. The buyer then follows a trajectory in the issue space from this initial offer towards one of its targets. Often, the buyer’s trajectory never reaches the target (i.e., the buyer never offers the target) with the intention of obtaining more favorable counter offers from the seller. The offer strategy of a buyer specifies the movement in the issue space of the buyer from one round to the next. The negotiation may be viewed as consisting of multiple phases or states, s = 1, 2, 3..., where, at each state s, the buyer follows some offer strategy. Each time the buyer changes its strategy, a new state starts. The buyer can return to a previously-visited state. For instance, the buyer might make small concessions both in price and delivery time for 1 ≤ n ≤ 5 (state s = 1), make large concessions in price while holding the delivery time constant for 6 ≤ n ≤ 8 (state s = 2), and, make small concessions both in price and delivery time for n > 8 (back to state s = 1). We design a seller’s agent that negotiates with the buyer. At every round n, the agent makes a counter offer to the buyer based on a prediction model that utilizes the history of the buyer’s offers up to and including round n. Thus, the counter offer is based on the future trajectory of the buyer as predicted by the agent. The agent estimates the likelihood of the buyer’s future steps in the issue space, and selects a counter offer that is likely to be along the buyer’s
4
future trajectory. The prediction model and the agent are discussed in sections 3 through 5.
3
Buyer Prediction Model
Linear dynamical systems (LDSs) have been used to model dynamical phenomena in multiple disciplines as diverse as communications, aeronautics, finance, tracking and navigation [11]. While the trajectory of the buyer in the issue space in a given state s can be captured by an LDS, the fixed (Gaussian) model of the LDS is insufficient to explain the buyer’s behavior across multiple states. A switching linear dynamical system (SLDS) is a variation on the LDS, where the fixed Gaussian model is replaced with a conditional Gaussian model [11,12]. The underlying conditional Gaussian model provides the flexibility needed to describe dynamical phenomena that exhibit structural changes over time. We model the buyer’s negotiation behavior through an SLDS with the transition model given by
xn = A(sn )xn−1 + vn (sn ), yn = Cxn + wn , pi,j = P (sn = i|sn−1 = j),
(1) (2) (3)
where xn ∈ RD is the hidden LDS state at round n, yn ∈ RD is the observation at round n, wn ∈ RD is the Gaussian measurement noise, vn (sn ) ∈ RD is the Gaussian state noise, A(sn ) ∈ RD×D is the LDS state transition matrix, and C ∈ RD×D is the LDS observation matrix. x1 is Gaussian. The joint distribution of X,Y and S is given by ∏
4
P (yn |xn )P (x1 |s1 )
∏
P (xn |xn−1 , sn )P (s1 )
∏
P (sn |sn−1 ), (4) where y1n denotes the sequence y1 , ..., yn , and similarly for xn1 and sn1 . While the SLDS learning problem has been addressed in the literature [11,12], the focus has been on learning an SLDS with the number of switching states known a priori. In our setting, it is not possible to assume a priori knowledge of the model size. The model size can grow with new issue offers from buyers. We address this problem in the next section where we discuss the infinite SLDS within the minimum description length framework. P (y1n , xn1 , sn1 ) =
Infinite SLDS modeling with MDL principle
We focus on the principle of minimum description length (MDL), with its roots in the works Kolmogorov, Solomonoff and Shannon [10,13,14], to model and learn the parameters of an infinite SLDS. We use Shannon-optimal code lengths to
5
compute the minimum code length required for coding the observations y1T . Any code requires a codebook, i.e., a lookup table that shows the mapping between the data to be coded and the code. In the SLDS setting, the codebook grows as the number of switching states increases. We find the SLDS model that minimizes the description length, which is the sum of the Shannon-optimal code lengths and the size of the codebook. Thus, the number of switching states of the SLDS does not have to be pre-defined; it is discovered through the MDL minimization. 4.1
Related work on infinite-length HMMs
The standard learning procedures for hidden Markov models (HMMs) assume that the number of hidden states is finite and known a priori. Beal [2], Teh [15] and Fox [8] extended HMMs to have a countably infinite number of hidden states through the hierarchical Dirichlet process. The hierarchical Dirichlet process has three hyper-parameters, two of them controlling the tendency of the process to enter a new state, and one parameter controlling the tendency of a state to self-transition. The Dirichlet model allows the process to discover new states through these hyper-parameters, and further ensures that the process can exit from a newly discovered state with non-zero probability. While the hierarchical Dirichlet model makes it possible to learn infinite HMMs, the inherent assumption is that the hidden state process is first-order Markov. The first-order Markov assumption is not valid for the hidden state process of the SLDS due to the presence of the switching state s. 4.2
MDL for the SLDS
The principle of minimum description length (MDL) stems from Shannon’s coding theory [1, 13, 19]. Given a set of hypotheses indexed by h, the observed data y can be described with the hypothesis h using total code length l(y|h)+l(h). Here, l(h) is the minimum code length to describe the selected hypothesis h, and l(y|h) is the minimum code length to describe y according to the selected hypothesis h. Each hypothesis h induces a probability distribution on the observed data. Based on the Kraft’s inequality, l(Y |h) = − log P (y|h) and l(h) = − log P (h). The MDL principle selects the hypothesis h = h∗ if h∗ minimizes l(y|h) + l(h) among all h. We let the switching state s play the role of the hypothesis h. This leads to the following code length for the SLDS given in (1)-(4) at time T , T ∑ n=1
lΘ1 (yn |y1n−1 , sn1 ) +
T ∑
lΘ2 (sn |sn−1 ),
(5)
n=1
where Θ1 are the parameters A(s), C, and the covariance matrices for v(s) and w, Θ2 is the pmf pi,j , lΘ1 maps yn |(y1n−1 , sn1 ) to its minimum code length under Θ1 , and lΘ2 maps sn |sn−1 to its minimum code length under Θ2 . A codebook is a lookup table used for coding and decoding; it shows the mapping between the values of y and s and the code. In the infinite SLDS
6
setting, the size of the codebook is not a constant. As new switching states are discovered, the codebook grows. Thus, the size of the codebook should be included in the description length. The description length therefore is T ∑ n=1
lΘ1 (yn |y1n−1 , sn1 )
+
T ∑
lΘ2 (sn |sn−1 ) + CΘ1 + CΘ2
(6)
n=1
where CΘ1 and CΘ2 denote the size of the codebook for the mapping between yn |(y1n , sn1 ) and its code, and the size of the codebook for the mapping between sn |sn−1 and its code, respectively. Learning The parameters Θ1 and Θ2 are learned to minimize (6) through the generalized EM algorithm. The E-step is the inference, i.e., the estimation of the posterior probabilities of the the hidden states xT1 and the switching states sT1 given the observations y1T and the parameters Θ1 and Θ2 . The M-step consists of the parameter updates. We focus on the case with a fixed number of switching states in this section (i.e., the codebook sizes are constant), and extend it to the infinite setting in the next section. Suppose n = T and there are K states. During the E-step, the switching states sT1 are set, and the posterior probabilities of xT1 are computed. We denote by HT,k the minimum value of (5), when the switching state sequence sT1 ends with sT = k. HT,k is computed recursively as Hn,k = min Hn−1,j +lΘ1 (yn |y1n−1 , sn−2,∗ (j), sn−1 = j, sn = k)+lΘ2 (sn = k|sn−1 = j), 1 j
(7) where 1 ≤ n ≤ T , and sn−2,∗ (j) is the switching state sequence that minimizes 1 Hn−1,j among all state sequences up to time n − 2. Based on the MDL, the second term on the right side of (7) is given by − log P (yn |y1n−1 , sn−2,∗ (j), sn−1 = j, sn = k). This probability is computed 1 through the standard forward algorithm for the Gaussian LDS inference based on the Θ1 parameters estimated in the M-step. The standard LDS inference can be used since the hidden states xn1 of the SLDS are first-order Markov when conditioned on the switching state sequence sn1 . The third term on the right side of (7), which is − log P (sn = k|sn−1 = j), is computed through the Θ2 estimate of the M-step. Once all the costs H up to time n = T are computed for every state, the index of the switching state that minimizes HT,k is selected, and the switching states are back-traced to find the best sequence of sT1 . The M-step updates the parameters, Θ1 and Θ2 , based on the posterior probabilities (from the standard LDS inference of the E-step) and the switching states from the E-step.
7
New switching states According to the MDL, an observation is said to be unexplainable if, for every hypothesis in the set of hypotheses, the description length of the observation conditioned on the hypothesis is greater than the unconditional description length of the observation. Consequently, if arg min lΘ1 (yn |y1n−1 , s1n−1 , sn = k) > − log(P (yn )),
(8)
k
for every state k, 1 ≤ k ≤ K, the existing switching model is insufficient to explain the observation yn . If (8) is satisfied, we add the time index n to the set P , the set of time steps with observations not explained by the switching model. Suppose there exist K states prior to time T , and the codebooks are CΘ1 and CΘ2 for the existing K-state model. We seek to find a subset p ∈ P and new codebooks (for the (K+1)-state model) C˜Θ1 and C˜Θ2 such that (6) decreases when the observations in p are assigned to the new state K + 1. In particular, we check if ˜ T,k + C˜Θ + C˜Θ , min HT,k + CΘ1 + CΘ2 > min H 1 2 k
k
(9)
˜ T,k has the same switching state sequence path as HT,k except that the where H members of the subset p are assigned to the new state K + 1. If (9) holds true, then at time T , a new state K + 1 is formed. The best subset p is selected by trying multiple permutations of the elements, i.e., subsets, of P , and selecting the subset that minimizes the right-hand side of (9). Although the number of permutations to find a good subset p may be large (e.g., it takes 2|P | permutations if one wants to try all possible subsets), one may always initialize the EM based on a subset selected from a small number of permutations. The approach provides a convenient way to initialize the EM algorithm for the new model with the K + 1 switching states. The EM algorithm described in the previous section is then applied to estimate the parameters, the switching states, and the posterior probabilities. Code lengths We use Shannon-optimal code lengths in computing the length functions l [4,13]. Thus, lΘ2 (sn = k|sn−1 = j) = − log(P (sn = k|sn−1 = j)).
(10)
As yn |(y1n−1 , sn1 ) is real-valued, we need to quantize it before coding. Since yn |(y1n−1 , sn1 ) is Gaussian, it can be shown that its quantized version can be coded with a Shannon optimal code length of approximately [4] 1 log(2πe)D |Σyn |(yn−1 ,sn ) | − log ∆, 1 1 2
(11)
8
where Σyn |(yn−1 ,sn ) is the covariance matrix, and ∆ is the volume of the quanti1 1 zation bin. Here, ∆ is assumed to be small. In the SLDS setting, the quantization bin volume ∆ acts as a parameter that controls the tendency to discover new states. The function of the parameter ∆ is similar to that of the hyper-parameters of the Dirichlet model of [2, 8, 15] that control the tendency of the process to transition into new states. It may appear as if ∆ does not have an effect on the minimization in (6) since log ∆ is an additive constant. However, ∆ influences the codebook size. As ∆ is increased, the quantization gets coarser, and thus the expected code lengths for yn |(y1n−1 , sn1 ) decrease. This decreases the size of the codebook, leading to more switched states. More importantly, however, the parameter ∆ has an actual meaning in the negotiation setting. For instance, the quantization bin width should be set small enough to make sure that significant differences in issue space are preserved. If the seller of a laptop computer knows that a price difference of more than 50 dollars is very important to a buyer in the transaction, then ∆ should be set at a value below 50.
5 5.1
Experiments with Simulated Trajectories The Agent
The seller’s agent is a software agent that negotiates the issues with buyers on behalf the seller. The agent is available to it a rank-ordered list of the issue vectors such that vectors that rank higher are more advantageous to the seller. The agent is also provided with some cut-off point in the list; vectors that rank higher than the cut-off are target vectors of the seller. The agent is provided with neither the rank-ordered list of the buyer nor the buyer’s target vectors. The goal of the agent is to make smart counter offers to the buyer, i.e., counter offers that are target vectors for the seller and are likely to be along the buyer’s trajectory. Hence, the agent should be able to predict the future trajectory of the buyer, and make counter offers that are compatible with the predicted trajectory. Given the history of the buyer’s offers up to (and including) time T, the agent needs to predict the buyer’s offers for n > T . Prediction models, such as the SLDS, provide conditional probabilities of the future trajectories of the buyer for n > T given the history of the buyer’s offers for n ≤ T . The agent’s goal is to make a counter offer that satisfies two conditions: First, the counter offer should be a target vector for the seller, and second, the counter offer should have a high likelihood of being along the future trajectory of the buyer. Towards this end, at time T , for each target vector u of the seller, we compute the probability, qu,t , that the buyer’s trajectory will pass through target u at or before time T + t. This probability is computed based on a prediction model (e.g., LDS or SLDS) that predicts the buyer’s future trajectory based on its offer history. We then set a probability threshold Q such that any target u with probability qu,t > Q is considered a candidate counter offer. Of the candidate
80
80
75
75
70
70 Mean buyer utility
Mean buyer utility
9
65 60 55
65 60 55
50
50
45
45
40 20
40
60 80 Maximum Rounds
100
120
40 20
40
60 80 Maximum Rounds
100
120
Fig. 1. Buyer utility values of the agent’s counter offers for the SLDS and other algorithms. Figures are for the conceder behavior. R=10 (left) and R=30 (right). SLDS (red, solid), infinite LDS (black, dashed), finite LDS (magenta, dashed), auto-regressive (green, dash-dot), replication (blue, dashed)
counter offers, the agent selects the one with the lowest value of t. This ensures that the counter offer has the property of having a high likelihood of being in the buyer’s path. A smaller value of t implies that the selected offer is more favorable to the buyer’s current state of mind. 5.2
Results
In [5], the buyers’ negotiation behaviors have been modeled through a sequence of functions; each function imitates the buyer’s offers in the issue space. The functions are monotonic in the issue values and belong to families of parametrized functions. The parameters determine whether the buyer’s behavior is boulware or conceder. We use the families of paramterized functions in [5] to generate buyer trajectories in the multi-issue space. We evaluate the performance of the seller’s agent. The agent negotiates issues with buyers on behalf the seller, where the buyer’s offers on the issues are generated through the simulations. The agent is available to it a rank-ordered list of the issue vectors with the higher-ranking vectors being more advantageous to the seller. The cut-off point (described in section 5.1) is set such that R percent of the issue vectors are seller’s target vectors for some R. The buyer has a different rank-ordered list of the issue vectors, and its cut-off point is set such that 20 percent of the issues vectors are buyer’s target vectors. At each time n = T , the agent computes each qu,t , as described in section 5.1, where t is set so that t + T = tmax , the total number of rounds. The probability threshold Q is set such that the top 10 percent of the qu,t values are selected as candidate offers. We compute the probability qu,t using five prediction models: Replication (yT +t = yT ), auto-regressive modeling ( yT +t = at ∗ yT + zT , for Gaussian noise Z), fixed-length LDS, infinite-length LDS and the SLDS. The fixed-length LDS learning was done through an EM framework with the E-step being a standard LDS forward inference. The infinite-length LDS learning was done through the
80
80
75
75
70
70 Mean buyer utility
Mean buyer utility
10
65 60 55
65 60 55
50
50
45
45
40 20
40
60 80 Maximum Rounds
100
40 20
120
40
60 80 Maximum Rounds
100
120
80
80
75
75
70
70 Mean buyer utility
Mean buyer utility
Fig. 2. Buyer utility values of the agent’s counter offers for the SLDS and other algorithms. Figures are for the boulware behavior. R=10 (left) and R=30 (right). SLDS (red, solid), infinite LDS (black, dashed), finite LDS (magenta, dashed), auto-regressive (green, dash-dot), replication (blue, dashed)
65 60 55 50
60 55 50
45 40 20
65
45 40
60 80 Maximum Rounds
100
120
40 20
40
60 80 Maximum Rounds
100
120
Fig. 3. Buyer utility values of the agent’s counter offers for the SLDS and other algorithms. Figures are for the linear concession behavior. R=10 (left) and R=30 (right). SLDS (red, solid), infinite LDS (black, dashed), finite LDS (magenta, dashed), autoregressive (green, dash-dot), replication (blue, dashed)
Dirichlet formulation of [2, 8, 15]. For the Dirichlet formulation of the infinite LDS and the SLDS, we selected the hyperparameters (for the LDS) and the ∆ values (for the SLDS) to get the best outcomes. In Fig. 1, we assigned a utility score (in the 0-100 range) to each issue point with higher-ranking points (in the buyer’s ranked-ordered list) with higher utility scores, and computed the mean of the utility values of the agent’s counter offers. In Fig. 1, the concession behavior of the buyer is that of a conceder. The SLDS performed best, followed by the LDS (fixed and infinite). The performance of the simple replication and auto-regressive models were poor compared to the SLDS. Thus, the SLDS was able to provide the best counter offers to the buyer, in terms of maximizing the buyer’s utility while ensuring that the counter offers are always selected from the seller’s target offers. As the maximum number of rounds is increased, the performance of each prediction algorithm improves as there is more offer history upon which the agent can build the predictions. As the value of R is increased (from left to
11
right), the agent allows more target vectors for the seller, and thus, the utility for the buyer increases. In Fig. 2, the experiment is repeated for a buyer with a boulware behavior. Similar to the case with the conceder behavior, the SLDS performed best, followed by the LDS (fixed and infinite). The performance of the simple replication and auto-regressive models were poor compared to the SLDS. The mean utility values are lower than those for the conceder behavior; this is expected because it is more difficult for the agent to predict the behavior of a boulware buyer. Finally, in Fig. 3, the experiment was repeated for a buyer with a linear concession behavior. In this case, the performance difference (in terms of mean buyer utility values) of the SLDS with the other algorithms is more significant. This implies that the SLDS performs best (when compared to the other algorithms) in a linear concession setting.
6
Experiments with Actual Buyer Behavior
We also validated the model against actual buyer behavior from an agent-human negotiation experiment [20]. For each case of the agent-human negotiation experiment, the buyer is assumed by a human subject, and was asked to maximize its own utility in the final agreement with the seller as high as possible, while meeting a minimum threshold at 44 utility points. There was no restriction of any specific offer and acceptance strategies that the buyer should use. The seller was assumed by a software agent coded with four negotiation strategies [20]. The seller agent has a same minimum threshold at 44 utility points. In all the four strategy manipulation conditions, the agent proposed a final offer to the buyer for decision. Therefore, each data point comprises a buyer’s offer history at all negotiation rounds, up to a maximum of 9 negotiation rounds. We compared the p-step prediction error of the infinite SLDS with other approaches for p = 1 and p = 2. The comparisons are included in Fig. 4. We varied the parameter ∆ from 0.01 to 1, and, for comparison purposes, converted each issue range to a non-integer scale from 0 to 10. The prediction errors in Fig. 4 are the squared errors averaged over all rounds, 110 subjects and 4 issues. For the very small values of ∆, the infinite SLDS model size grew very slow with too few switching states s, leading to a poor prediction accuracy. As ∆ was increased, the prediction accuracy improved until .1 < ∆ < .2. Then, for ∆ > .2, the cost of forming new states started to become too low and the model started to produce too many new states, lowering the prediction accuracy. At ∆ = .2, the parameter value that minimized the prediction error, we observed an unexplainable issue offer (implying a new state is necessary) at a rate of approximately once in every 30 offers. It would be interesting to explore if this rate has any significance in conjunction with the human psychology community. The other four prediction models included in the comparisons in Fig. 4 were simple replication (yt+p = yt ), auto-regressive modeling ( yt+p = ap ∗ yt + zn , for Gaussian noise Z), fixed-length LDS and infinite-length LDS. The simple
10
10
8
8
Two−step prediction error
One−step prediction error
12
6
4
2
0 0
0.2
0.4
0.6
0.8
6
4
2
0 0
1
0.2
0.4
Delta
0.6
0.8
1
Delta
70
70
65
65 Mean buyer utility value
Mean buyer utility value
Fig. 4. Prediction comparison of SLDS with other algorithms. 1-step (left) and 2-step (right) prediction errors. SLDS (red, solid), infinite LDS (black, dashed), finite LDS (magenta, dashed), auto-regressive (green, dash-dot), replication (blue, dotted)
60
55
50
45 0
60
55
50
0.2
0.4
0.6 Delta
0.8
1
45 0
0.2
0.4
0.6
0.8
1
Delta
Fig. 5. Buyer’s utility value for counter-offers. Seller’s thA low (left) and seller’s thA high (right). SLDS (red, solid), infinite LDS (black, dashed), finite LDS (magenta, dashed), auto-regressive (green, dash-dot), replication (blue, dotted)
replication and the auto-regressive model led to high prediction errors, implying the need for a state-space based model. We quantified how well a “smart agent” would perform, where a smart agent is one that makes counter offers by predicting the buyer’s trajectory. The selected counter offer is the one that is closest to the predicted trajectory (closest in Euclidean distance after converting each issue range to the 0 to 10 range) of the buyer, provided it is above the threshold thA (defined in section 5.1) in the agent’s list. 9 counter offers per buyer is made. In Fig. 5, we assigned a utility score (in the 0-100 range) to each issue point with higher-ranking points (in the buyer’s ranked-ordered list) with higher utility scores, and computed the mean utility value of agent’s best counter offer (of the 9 counter offers). The SLDS performed best, followed by the LDS (fixed and infinite). The performance of the simple replication and auto-regressive models were poor compared to the SLDS. Thus, the SLDS was able to provide the best counter offers to the buyer, in terms of maximizing the buyer’s utility while ensuring that the counter offers are above rank thA .
13
7
Conclusions
Electronic commerce provides consumers with the ability to purchase products and services from businesses online. Despite their widespread adoption, electronic commerce systems still lack the ability to enable the businesses to negotiate with consumers during the purchases. We develop a seller’s agent, a software agent that negotiates the issues with buyers on behalf the online merchant. The agent makes smart counter offers to the buyer, i.e., counter offers that are profitable for the seller and are likely to be accepted by the buyer. For this, the agent uses prediction models to predict the future trajectory of the buyer, and makes counter offers that are compatible with the predicted trajectory. We predict the behavior of the human buyer in the issue space; the human negotiation behavior models in [5] are used to build the predictors. Our approach is more realistic than the more traditional game-theoretic approach of estimating utility functions since human buyers rarely construct mathematical functions prior to negotiation to quantify their issue preferences. Further, humans are not super-rational beings; it is unlikely that a human will always give the most rational decisions in terms of maximizing a mathematical function. To capture the switching behavior, we extend the LDS to the SLDS prediction. We allow the model size to grow with new offers through the MDL setting. We are working towards a new experiment, where the actual human subjects will negotiate with the agent in section 5. We are also exploring how an agent’s counter offers influence the buyer behavior. Yet another research thread for us is to utilize the findings of the buyer negotiation pshycology research in designing software agents.
8
Acknowledgements
We would like to thank Yinping Yang and Yunjie Xu for the data used in this paper from their negotiation experiments [20]. References [1] A. Barron, J. Rissanen, and B. Yu. The minimum description length principle in coding and modeling. IEEE Transactions in Information Theory, 44(6):27432765, 1998. [2] M. Beal, Z. Ghahramani, and C.E. Rasmussen. The infinite hidden Markov model. In NIPS, 2002. [3] R.M. Coehoorn and N. Jennings. Learning an opponent’s preferences to make effective multi-issue negotiation trade-offs. In ICEC, 2004. [4] T. Cover and J. Thomas. Elements of Information Theory, New York, Wiley, 1991. [5] P. Faratin, C. Sierra, and N. R. Jennings. Negotiation decision functions for autonomous agents. Int. Journal of Robotics and Au- tonomous Systems, 24(34):159182, 1998.
14
[6] S. Fatima, M. Wooldridge, and N. R. Jennings. Multi-issue negotiation under time constraints. In AAMAS, 2002. [7] S. Fatima, M. Wooldridge, and N. R. Jennings. An agenda based framework for multi-issues negotiation. Artificial Intelligence Journal, 152(1):145, 2004. [8] E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. An HDP-HMM for systems with state persistence. In ICML, 2008. [9] T. Ito, H. Hattori, and M. Klein. Multi-issue negotiation protocol for agents: Exploring nonlinear utility spaces. In IJCAI, 2007. [10] A. N. Kolmogorov. Three approaches to the quantitative definition of information. Probl. Inform. Transm. vol. 1. pp. 4-7, 1965 [11] V. Pavlovic, J. M. Rehg, T. J. Cham, and K. P. Murphy, A dynamic bayesian network approach to figure tracking using learned dynamic models. In Intl. Conf. Computer Vision, 1999. [12] V. Pavlovic, J. Rehg, and J. MacCormick. Learning switching linear models of human motion. In NIPS, 2001. [13] C. E. Shannon and W. Weaver. The mathematical theory of communication. University of Illinois Press, Urbana, IL, 1949. [14] R. J. Solomonoff. A formal theory of inductive inference I,II, Inform. and Control, vol. 7 pp. 1-22, 224-254, 1964. [15] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of American Statistics Association, 101(476):1566-1581, 2006. [16] G. A. Van Kleef, C. K. W. De Dreu, and A. S. R. Manstead. The interpersonal effects of anger and happiness in negotiations. Journal of Personality and Social Psychology, 86, pp. 5776, 2004. [17] G. A. Van Kleef, C. K. W. De Dreu, and A. S. R. Manstead. The interpersonal effects of emotions in negotiations: A motivated information processing approach. Journal of Personality and Social Psychology, 87, pp. 510-528, 2004. [18] R. Vetschera. Preference structures and negotiator behavior in electronic negotiations. Decision Support Systems, vol. 44, pp. 135-146, 2007. [19] C. Wallace and D. L. Dowe. Minimum mesaage length and Kolmogorov complexity. The Computer Journal, vol. 42, no. 4, pp. 270-283, 1999. [20] Y. Yang, Y, S. Singhal, Y. Xu. Offer with Choices and Accept with Delay: A Win-Win Strategy Model for Agent-Based Automated Negotiation. In ICIS2009, 2009. [21] G. A. Yukl. Effects of situational variables and opponent concessions on a bargainers perception, aspirations, and concessions. Journal of Personality and Social Psychology, vol. 29, pp. 227-236, 1974.