TATM: A trust mechanism for social traders in double auctions

TATM: A trust mechanism for social traders in double auctions Jacob Dumesny1 , Tim Miller1? , Michael Kirley1 , and Liz Sonenberg2 1

Dept. of Computer Science & Software Engineering, University of Melbourne 2 Dept. of Information Systems, University of Melbourne [email protected] {tmiller,mkirley,l.sonenberg}@unimelb.edu.au

Abstract. Traders that operate in markets with multiple competing marketplaces can use learning to choose in which marketplace they will trade, and how much they will shout in that marketplace. If traders are able to share information with each other about their shout price and market choice over a social network, they can trend towards the market equilibrium more quickly, leading to higher profits for individual traders, and a more efficient market overall. However, if some traders share false information, profit and market efficiency can suffer as a result of traders acting on incorrect information. We present the Trading Agent Trust Model (TATM) that individual traders employ to detect deceptive traders and mitigate their influence on the individual’s actions. Using the JCAT double-auction simulator, we assess TATM by performing an experimental evaluation of traders sharing information about their actions over a social network in the presence of deceptive traders. Results indicate that TATM is effective at mitigating traders sharing false information, and can increase the profit of TATM traders relative to non-TATM traders.

1

Introduction

Niu et al. [6] demonstrate that competition between marketplaces is reflected directly by the migration of traders between those marketplaces. Traders migrate based on estimates of expected profits, derived from the trader’s past experience with that specialist. Individual traders can improve their strategies based on shared information. Intra-marginal traders — those sellers (buyers) whose shout price is below (above) the market clearing price, and are therefore successfully matched – could communicate to fellow intra-marginal traders about marketplaces that are highly efficient, which would lead to an increase in the number of intra-marginal traders in that marketplace, thus increasing profits for both trader and marketplace. Furthermore, intra-marginal traders can communicate to extra-marginal traders — those sellers (buyers) whose shout price is above (below) the marking clearing price, and are therefore not matched — which provides the extra-marginal traders with some bounds on the market clearing price in a given marketplace. ?

Corresponding author

However, traders can share false information. For example, deceitful sellers can communicate that a successfully matched trade had a higher price than is true, encouraging extra-marginal buyers and sellers to increase their price to obtain a match. In addition, deceitful sellers can falsely claim a high price was obtained in another marketplace, thus encouraging other intra-marginal sellers in its marketplace to migrate away, leaving less competition. Such false information has the potential to disrupt a market by increasing the profit of deceitful traders at the expense of other traders and market specialists themselves. In this paper, we present the Trading Agent Trust Model (TATM) that individual traders employ to detect deceptive traders and mitigate their influence on the individual’s actions. TATM is a simple trust mechanism based in part on the FIRE model [2]. Traders employing TATM receive information from their neighbours on a social network that outlines their shout information from the previous trading day, such as price and marketplace choice. TATM traders will mimic their neighbours on some trading days, and use their own success to judge whether their neighbour is truthful or deceitful. Using the JCAT double-auction simulator [6], experimental evaluate TATM traders sharing information about their actions over a social network in the presence of deceptive traders. Results indicate that TATM is effective at mitigating traders sharing false information. The profit of deceitful traders is reduced in the presence of TATM traders, but in most cases, still remains higher than truthful traders, and the profit of TATM traders is increased compared to na¨ıve traders that employ no trust model.

2

Related Work: Trust and Reputation Mechanisms

Enhancing decision making in trading markets by mimicing successful peers has the potential to improve both individual agent performance and market efficiency. However, notions of trust and reputation must be considered if reliable estimates of peer ability are to be constructed. In this section, we briefly review key trust and reputation models from the multi-agent systems literature that are applicable in this domain. McKnight and Chervany [4] identify four primary categories of trust: competence, integrity, benevolence, and predictability, which can be used to facilitate effective interactions and cooperation between agents. Typically, trust models consider a variety of information sources that are combined to determine a measure of trust according to the specific preferences of the agent. Closely related to this trust metric is an agent’s reputation, which can be thought of as an assessment based on the history of interactions with, or observations of, other agents [2, 9]. In an early trust model, Marsh [3] considered a local trust dimension derived directly from agent interactions. The trust value was simply a probability value based on independent criteria with additional ad hoc factors associated with risk and importance. Mui [5] proposed a model that extended this idea by incorporating a multi-part reputation metric derived from components embedded in a

social network. In this model, individual reputation could be derived from direct observation of other agents, or from inferences based on information gathered from a social network. In subsequent work, Smith and Desjardins [7] incorporated aspects of competence and integrity into a formal framework for decision making based on trust and reputation. Two key phases were used in their framework: (i) assessing the capability of an agent to fulfil its stated commitments (competence and integrity) based on observation; and (ii) applying this knowledge to make effective decisions when interacting with the agents observed. Perhaps the most well-known trust and reputation model in the multi-agent systems community is FIRE, a model that integrates four different information sources to produce a comprehensive assessment of an agent’s likely performance [2]. FIRE uses a single composite trust-reputation value derived from: interaction trust, role-based trust, witness reputation and certified reputation. Direct interactions between agents can be used to calculate trust-reputation values. However, social network interactions (witness and certified) are also used in calculations when direct interactions between any pair of agents have not occurred. A notable trust model from the recommender systems domain is proposed by Walter et al. [9]. In their model, agents use their social network to gather information and use trust relationships to filter the collected information. Recommendations from neighbours may be received directly or indirectly via the larger pool of connected agents in the network. The trust values provide a ranking of the recommendations received. A probabilistic selection mechanism is then used for decision making.

3

The Trading Agent Trust Model

TATM employs a trust and reputation mechanism that aims to detect deceptive traders on a social network used for sharing trade information. Fundamentally, TATM is a reinforcement learning-based model, consisting of three components: 1. a return updating policy for estimating the trustworthiness of its neighbours, based on the interactions it has had with these neighbours; 2. an action choosing policy for deciding which neighbour on the social network is to be imitated in the next round; and 3. a decision-making strategy for mimicking the marketplace selection and last shout placed by a neighbour. 3.1

Returning updating policy

The return updating policy employed in TATM is based on a component of the FIRE model [2], presented in Section 2. Interaction trust is built from the direct experience of an agent. Specifically, each agent rates its partner’s performance after every transaction and stores its ratings. When an agent requires the trust value of another agent, it calculates this based on the past ratings using a rate weighting function that favours more recent interactions. The rating recency function is given by the formula:

w(ri ) = e

−∆t(ri ) λ

(1)

in which ri the rating of a particular interaction, ∆t(ri ) is the time that has elapsed since that interaction, and λ is a parameter used to modify the decay of a rating (a lower value of λ means that older ratings are weighted lower). The return updating policy must be customised for a particular domain. In JCAT, traders are given an upper bound on the number of commodities they can trade each day. On a given day, each trader attempts to trade as many of these as possible at its shout price in a specified marketplace. The shout price, marketplace, and number of trades made is shared on the social network after each trading day, as well how many successful trades were made. In TATM, a trader mimics a neighbour’s shout (see Section 3.2 for a discussion of neighbour selection) by using the same shout price and marketplace (all traders have the same maximum number of trades). After playing this strategy, the trader updates its feedback for that neighbour using a parameter : 1. ri = − if the trade resulted in a smaller numbers of trades than specified by the neighbour. We refer to this as a deceptive case. 2. ri = if the trade resulted in more than or equal to the number of trades specific by the neighbour. We refer to this as a non-deceptive case. However, in a double auction, traders can take advantage of their private information to preemptively determine deceit. For example, if a seller receives information that a neighbour had sold its allocation at price q in market m, and the seller itself placed shouts at price p where p < q, also in market m, that were not matched, then it is highly likely that the neighbour is attempting to deceive; otherwise it is likely that the seller would also have received successful matches. As a result, TATM uses the following preemptive rules: 1. ri = −2 if a seller (buyer) indicates a successful shout hq, mi, and the trader’s own shout hp, mi, where p < q (p > q), was not matched. 2. ri = − if a seller (buyer) and the trader’s own shout hp, ni, where p < q (p > q), was not successfully matched. The traders are in different markets (m and n), so we are less sure that the neighbour is deceptive. In each of these cases, the neighbours trade is not mimicked, and the trader reverts to its underlying strategy. This feedback policy is simple and does not detect the degree to which a neighbouring trader is deceptive by, for example, measuring the difference in the number of trades. 3.2

Action-choosing and decision-making policies

To choose which neighbour to mimic, a trader calculates a score for each neighbour as a function of its trust value for that neighbour and the neighbour’s claim of its performance on the previous trading day. We consider only the previous day for simplicity. The trust of each neighbour, a, which we will call the neighbour’s Q-value, is given by the average of all interactions (Equation 1): Pn wa (ri ) (2) Q(a) = i=1 n

in which n is the number of interactions between the trader and its neighbour. Each trader should explore its neighbours to obtain recent feedback, but should also exploit the knowledge it has built up over time. This exploit-vsexplore dilemma is addressed using a softmax action selection method that uses a Boltzmann distribution [8]. Using this strategy, an agent explores its environment first and then gradually moves its stance towards exploitation when it learns more about the environment. Thus, the trading agent chooses an action a with the probability of: eQ(a)/τ , P (a) = Pn Q(b)/τ b=1 e

(3)

in which Q(a) is the value of an action based on feedback from previous applications of that action (in our case, interactions with a neighbour), and τ is a positive number that dictates how much of an influence the past data has on the decision. A high τ value specifies a low influence, while a low value causes them to be close to their Q(a) values. A parameter, α ∈ (0..1], specifies a rate of decay such that after each action, the value of τ becomes τ0 · α, in which τ0 is the value of τ in the previous round. The selection of a neighbour is governed by the following formula: max{a | prof it(a) × P (a)} (where the trader is a seller) A= (4) min{a | prof it(a) × P (a)} (where the trader is a buyer) in which prof it(a) is trader a’s profit from the previous day (see Equation 7). Therefore, the score for each neighbour is the multiple of its shout price from the previous day and its trust value relative to other neighbours. A seller (buyer) chooses the neighbour with the highest (lowest) score. If and only the neighbour’s shout price is greater than the traders, the trader will mimic the neighbour.

4

Experimental Setup

We use JCAT 0.173 to run CAT simulations to examine the effectiveness of the TATM model. We measure that average daily profit for each type of trader, as well as the global allocative efficiency, which is a measure of social welfare. 4.1

Traders

We implemented four different types of trading agents: 1) a na¨ıve truthful trader (no trust model); 2) a na¨ıve deceptive trader; 3) a TATM truthful trader; and 4) a TATM deceptive trader. Deceptive traders. Deceptive traders deceive their neighbours by modifying their shout information before sharing it with their neighbours. There are two pieces of information that are modified: shout price, and the number of matches achieved. A parameter, δ > 0, specifies the amount by which this information is 3

http://jcat.sourceforge.net/.

modified. Given a shout of p in which the trader received n matches, a deceptive trader will share the false information: hp × (1 + δ), n × (1 + δ)i (where the trader is a seller) hp × (1 − δ), n × (1 + δ)i (where the trader is a buyer)

(5)

That is, a seller will attempt to raise the general price level of the market, while a buyer will attempt to lower it, thus pushing intra-marginal traders of the same type to be extra-marginal, and inducing traders of the opposite type into the intra-marginal range. Na¨ıve traders. Na¨ıve traders employ a system in which they simply choose the neighbour with the best offering. That is, they have no learning mechanism to determine the trustworthiness of neighbours, and they simply choose a neighbour to mimic using the following: max{a | prof it(a)} (where the trader is a seller) A= (6) min{a | prof it(a)} (where the trader is a buyer) Underlying strategies. In our experiments, when a trader chooses not to mimic a neighbour, it employs its own underlying strategies, which are the zerointelligence constrained strategy for shout prices, which chooses a random value between the minimum and maximum range, provided that this does not result in a loss for the trader, and random market selection. 4.2

Experiment variables and parameters

The independent variables of the experiment are the type of trader. We run two sets of experiments: one in which all traders are na¨ıve (the non-TATM markets), and one in which all traders employ TATM (the TATM markets). To help generalise the results, we vary other parameters in the experiment. For both sets of experiments, we modify the following two parameters: 1. Number of deceptive traders (ξ) — We vary the ratio of deceptive traders to non-deceptive in the market from 0.1–0.9, in intervals of 0.1. 2. Deceit level of deceptive traders (δ) – We vary the degree to which deceptive traders exaggerate their success from 0.1–1.0, in intervals of 0.1. We run all pairwise combinations of these parameters, resulting in 90 different configurations in each experiment. Each configuration is run 30 times and each game lasts 400 days. The results to be presented in the next section are averaged over the total 12,000 days. Traders are each allowed to trade three units of goods each day and their private values are drawn from the uniform distribution between 50 and 100. Other parameters are held constant. Each marketplace runs a continuous double auction [1]. We run five marketplaces in each experimental run, in which each marketplace charges at a different level on the profit of traders: 0%, 20%, 40%, 60%, and 80% respectively. Traders operate on a 14 × 14 toroidal grid social network, evenly divided between sellers and buyers, and with neighbours randomly assigned; that is, on aggregate, sellers are connected to an even number of buyers and sellers.

Measures. In these experiments, we record two measures. First, we measure the mean trader type profit, which is the mean daily profit over all simulations of each type of trader (random, TATM, and deceptive). The daily profit for a trader i is: (n × |va − pa |) − fa (where pa > 0) prof it(a) = (7) −fa (where pa = 0) in which vi is the private valuation of trader i, pi is the price of the trade made by trader i, n is the number of successful trades, and fi are the fees paid by trader i. In the case that a trader does not make a successful trade that day, they lose the fees charged by the marketplace. The mean daily profit of a trader type on a single day is: Pn pri (8) P = i=m N in which traders m..n are the traders of a particular type. Second, we measure the global allocative efficiency, which measures how close the entire market is to trading at the equilibrium price, where the equilibrium price is defined as the price at which demand equals supply when all traders offer to buy or sell at their private value, assuming that all traders in the market can trade with each other. The global allocative efficiency is calculated using: P P j j j i |vi − pi | E=P P j (9) j i |vi − p0 | in which p0 is the equilibrium price of the market, vij is the private value of trader i in marketplace j, and pji is the price paid by trader i in marketplace j.

5

Results

Figure 1 plots the mean of daily trader profit (Equation 8) for all 90 configurations of the experiment over the 30 iterations for the non-TATM market. These plots are included to illustrate the effect of the experiment parameters. The plots for global efficiency and for the TATM-market look similar: a clear downward trend as the deceit level increases, so these plots are omitted for brevity. From these figures, we can see a clear downward trend in profit as the deceit level of the deceptive traders increases. Surprisingly, the number of deceptive traders has little impact on either trader profit or efficiency. This minimal impact can be explained by the fact that the number of deceptive sellers is in balance with the number of deceptive buyers, and on aggregate, each trader is connected to an equal number of buyers and sellers. As a result, when a seller (buyer) deceives another trader by increasing (decreasing) their previous shout by the specified deceit level, the receiving traders’ new shout is likely to be matched by a trader of the opposite type. The trend downwards as deceit level increases is expected. The probability of getting a match reduces, first, as the range of

shouts starts to increase, and second, as traders’ shout values move around the range instead of moving towards the market equilibrium. A more important result is the effect of the trust model. Figure 2 shows the difference in mean profit (expressed as a percentage) between deceptive and truthful in the non-TATM market (Figure 2a) and in the TATM market (Figure 2b). The horizontal plane shows 0%, making it easier to see the distinction between a negative and positive change. From Figure 2a, we can see that deceptive traders perform better than na¨ıve truthful traders, except in the cases in which the deceit level is 0.9. This sharp spike is likely due to the fact that profit obtained by the deceptive traders themselves becomes so poor that they will mimic na¨ıve truthful traders. Figure 2b indicates that the impact of deceit can be mitigated using TATM. Truthful TATM traders outperform their deceptive counterparts for deceit levels 0.7−0.9, and the difference between the two for other parameters is significantly lower, bottoming at just above 2% compared with almost 6% for the na¨ıve traders. Figure 3 shows the inter-market comparison of deceptive traders and truthful traders. It is important to note the different ranges on the Z axes between Figures 3a and 3b. Figure 3a shows the percentage change in mean trader profit for truthful agents between the non-TATM markets and the TATM markets respectively. This figure demonstrates that employing the TATM model results in a higher trader profit for all of the parameters, and that the higher the level of deceit, the larger the change. This upward trend is because as deceptive agents increase their deceit level, deceit becomes easier to identify. Figure 3b plots the same data for the deceptive traders, showing some interesting results. First, even in the presence of the TATM model, deceit can be beneficial. However, this only holds if there are few other deceptive traders in the market. We attribute this increase in profit to the fact that the deceptive traders themselves are employing the TATM model, so are less likely to

48 46 44 42 40 38 36 34 0.9 0.8 0.7 l 0.6 e 0.1 0.2 0.5 ev 0.4 it L No.0.3Dec0.4 0.5 0.3 ece eptive T0.6 0.7 raders 0.8 0.9 0.10.2 D

(a) Truthful

48 46 44 42 40 38 36 34 0.9 0.8 0.7 l 0.6 e 0.1 0.2 0.5 ev 0.4 it L No.0.3Dec0.4 0.5 0.3 ece eptive T0.6 0.7 raders 0.8 0.9 0.10.2 D

(b) Deceptive

Fig. 1: Mean trader profit per type (truthful or deceptive) for the non-TATM market.

8.0% 6.0% 4.0% 2.0% 0.0% -2.0% -4.0% -6.0% 0.9 0.8 0.7 l 0.6 e 0.1 0.2 0.5 ev 0.4 it L No.0.3Dec0.4 0.5 0.3 ece eptive T0.6 0.7 raders 0.8 0.9 0.10.2 D

(a) Truthful vs. Deceptive (no trust model).

20.0% 15.0% 10.0% 5.0% 0.0% 0.9 0.8 0.7 l 0.6 e 0.1 0.2 0.5 ev 0.4 it L No.0.3Dec0.4 0.5 0.3 ece eptive T0.6 0.7 raders 0.8 0.9 0.10.2 D

(b) Truthful vs. Deceptive (TATM).

Fig. 2: Plots of the difference in mean profit between deceptive and truthful agents in the two experiments respectively (intra-market comparison), expressed as a percentage. This is calculated as (PB − PA )/PA , where PA and PB are mean trader profit Equation 8), for A vs. B. Note the different limits on the Z axis. The gray plane is 0%.

10.0% 8.0% 6.0% 4.0% 2.0% 0.0% 0.9 0.8 0.7 0.6 el 0.1 0.2 0.5 ev 0.4 eit L No.0.3Dec0.4 0.5 0.3 ec eptive T0.6 0.7 raders 0.8 0.9 0.10.2 D

3.0% 2.0% 1.0% 0.0% -1.0% -2.0% -3.0% -4.0% -5.0% 0.9 0.8 0.7 0.6 vel 0.1 0.2 0.5 e 0.4 it L No.0.3Dec0.4 0.5 0.3 ece eptive T0.6 0.7 D 0.2 raders 0.8 0.9 0.1

(a) Truthful(no trust model) vs. Truth-(b) Deceptive(no trust model) vs. Decepful(TATM). tive(TATM).

Fig. 3: Plots of the percentage change of mean profit between deceptive traders in each experiment, and truthful traders in each experiment (inter-market comparison). mimic other deceptive traders. As the level of deceptive traders increases, being deceptive becomes less profitable.

6

Discussion and Conclusions

Our results demonstrate that employing TATM is always preferably to a baseline “no trust” model, as the mean daily profit achieved by traders is higher than their na¨ıve counterparts for all experiment configurations.

The TATM model reduces the effects of deceptive traders, but these effects cannot be completely eliminated. The TATM model also helps to mitigate the differences between truthful and deceptive traders. While deceptive traders increased their profit in some experimental runs of the TATM market, this is attributed to themselves employing the TATM model. However, the difference between the truthful and deceptive traders is smaller in the TATM markets. Market efficiency also improves in the TATM model, except when there are a high number of deceptive traders with a high deceit level. In these particular cases, the deceptive traders perform worse themselves. Overall, the conclusions support our hypothesis that a simple trust model such as TATM can mitigate the problems of deception in markets. In future work, we plan to investigate indirect information sharing within a social network and extending the TATM model to handle this. We also plan to investigate how TATM can be improved to further mitigate the effects of deceptive traders. Acknowledgements. The authors thank Peter McBurney of King’s College London for his insight into this work, and the University of Melbourne Visiting Scholar’s Scheme for funding Peter’s visit to Melbourne.

References 1. Friedman, D.: The double auction institution: A survey. In: Friedman, D., Rust, J. (eds.) The Double Auction Market: Institutions, Theories and Evidence, chap. 1, pp. 3–25 (1993) 2. Huynh, T.D., Jennings, N.R., Shadbolt, N.R.: An integrated trust and reputation model for open multi-agent systems. Autonomous Agents and Multi-Agent Systems 13(2), 119–154 (2006) 3. Marsh, S.: Formalising Trust as a Computational Concept. PhD thesis, University of Stirling (1994) 4. McKnight, D.H., Chervany, N.L.: Trust and distrust definitions: One bite at a time. In: Trust in Cyber-societies. LNCS, vol. 2246, pp. 27–54. Springer (2001) 5. Mui, L.: Computational Models of Trust and Reputation: Agents, Evolutionary Games, and Social Networks. PhD thesis, MIT (2003) 6. Niu, J., Cai, K., Gerding, E., McBurney, P., Parsons, S.: JCAT: A platform for the TAC market design competition. In: Proc. of 7th Int. Conf. on Autonomous Agents and Multiagent Systems. pp. 1649–1650. IFAAMAS (2008) 7. Smith, M.J., Desjardins, M.: Learning to trust in the competence and commitment of agents. Autonomous Agents and Multi-Agent Systems 18(1), 36–82 (2009) 8. Sutton, R.S., Barto, A.G.: Reinforcement Learning : An Introduction (1998) 9. Walter, F.E., Battiston, S., Schweitzer, F.: A model of a trust-based recommendation system on a social network. Autonomous Agents and Multi-Agent Systems 16(1), 57–74 (2008)