Specialisation Dynamics in Federated Web Search

Specialisation Dynamics in Federated Web Search Rinat Khoussainov and Nicholas Kushmerick Department of Computer Science University College Dublin Belfield, Dublin 4, Ireland {rinat,

nick}@ucd.ie

ABSTRACT Organising large-scale Web information retrieval systems into hierarchies of topic-specific search resources can improve both the quality of results and the efficient use of computing resources. A promising way to build such systems involves federations of topicspecific search engines in decentralised search environments. Most of the previous research concentrated on various technical aspects of such environments (e.g. routing of search queries or merging of results from multiple sources). We focus on organisational dynamics: what happens to topical specialisation of search engines in the absence of centralised control, when each engine makes individual and self-interested decisions on its service parameters? We investigate this question in a computational economics framework, where search providers compete for user queries by choosing what topics to index. We provide a formalisation of the competition problem and then analyse theoretically and empirically the specialisation dynamics of such systems.

Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Software—distributed systems; I.2.11 [Artificial Intelligence]: Distributed Artificial Intelligence—intelligent agents, multiagent systems

General Terms Management, Economics, Experimentation

Keywords Federated Web search, topic specialisation, competition

1. INTRODUCTION Heterogeneous federations of topic-specific Web search engines are a popular vision for Web search systems of the future [22]. Typically, such search environments consist of a federation of multiple specialised search engines and metasearchers. The specialised search engines provide focused search services in a particular topic

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WIDM’04, November 12–13, 2004, Washington, DC, USA. Copyright 2004 ACM 1-58113-978-0/04/0011 ...$5.00.

domain. Metasearchers distribute user queries only to the search engines providing the best service for the query (as measured by the suitability of their content or other relevant service parameters). We investigate the topic specialisation dynamics of federated Web search systems with decentralised management of participating search services. In the following subsections we present motivation for this research, an overview of the problem domain, and finally, the main contributions of this paper. Motivation. Organising large-scale information retrieval systems into topical hierarchies of specialised search services can improve both the quality of results and the efficient use of computational resources. In particular, topic-specific search engines can provide better opportunities for integration of terminology features (e.g. synonyms) and ontologies. Document relevance is very much person-dependent. That is why personalisation and user modelling become increasingly important in Web search engines1 . One can expect that personalisation techniques should work better in specialised search, because it is easier to tailor the service for the more homogeneous user audience of a specialised search engine. Since only a topic-specific subset of all available documents is searched for each query in a federated system, the amount of processing required for individual requests can be significantly reduced resulting in a more efficient use of computational resources. In fact, even the architects of the existing large centralised search engines point out that being able to intelligently select a subset of the Web to search will result in much more cost-efficient solutions [18]. Another important advantage of federated search environments is that they can provide access to arguably much larger volumes of high-quality information resources, frequently called the “deep” or “hidden” Web. The “deep” Web consists the documents that cannot be crawled by centralised search engines, including non-Web document databases or pay-for-view content. In federated Web search environments, independent search engines can be provided for different “deep” Web resources. Examples of these include search in various digital libraries, like IEEE or ACM. Metasearchers give users one-stop access to a large number of such services. There already exist metasearchers that help searching in tens of thousands of specialised search engines2 . While there may be no theoretical obstacles to building hierarchical topic-specific search systems under control of a single service provider (e.g. Google could in principle reorganise their system into a hierarchy of topical sub-engines), in practice however, these efforts may be akin to developing a universal ontology for the Semantic Web. Instead, we may expect that as the need for data exchange between heterogeneous and geographically spread computers brought us the Internet, the demand for effective and efficient 1 2

See e.g. labs.google.com/personalized For instance www.completeplanet.com

Web search would stimulate the development of the corresponding search infrastructure with multiple independent but interworking service providers. In fact, Freenet or Gnutella can be viewed as examples of this trend. Therefore, decentralised control of participating search services is going to be a distinguishing feature of federated Web search environments. Problem overview. Prior research in distributed and peer-topeer information retrieval has mainly targeted various technical aspects of federated Web search. Examples are metasearch (query routing) algorithms for finding the best search engines for each query [4, 11, 21], or focused crawling techniques for building topicspecific search indices [6, 15]. We focus on organisational dynamics of federated Web search environments. How can search providers choose what topics to index in the absence of centralised control? What happens to topical specialisation of participating search engines when each service provider makes individual and likely self-interested decisions on its service parameters? Somewhat similar concerns regarding decentralised control in large-scale information systems have been analysed for peer-to-peer file sharing (Gnutella) [1] and information filtering systems [13]. However, little work has been done in this direction for federated Web search environments, though the above questions are clearly important for their successful deployment. We adopt an economic framework. There must be a profit-related incentive for search providers to participate in federated environments. Profits are generated from processing search queries by charging searchers and/or advertisers. An important factor that affects profits of a given engine in federated search environments is competition with other independently controlled search engines. When there are many engines available, users will send queries to those that provide the best results for the corresponding topics. Thus, the service offered by one engine influences queries received by its competitors. We view multiple search providers as participants in a search services market competing for user queries by deciding what topics to index (or what price to charge users and/or advertisers). Deriving competition strategies for engines in such markets is a challenging task. The utility of any local index content change depends on the simultaneous state and actions of other engines in the system. The uncertainty about competitors, changes in the environment, and the large number of competing engines make the task difficult. In prior work [14], we proposed a multi-agent reinforcement learning (MARL) approach to competing in federated Web search environments. We used a gradient-based RL algorithm to learn competition strategies from interactions with opponent engines in a simplified ad hoc scenario. Contributions. In this paper, we analyse possible outcomes of such competition process. We are interested in whether economic mechanisms can support effective decentralised management of topical specialisation in federated Web search environments with independent self-interested service providers, and what measures can be used to resolve potential problems. The main contributions of this paper are as follows: • We present a generic formal framework modelling competition between search engines as a partially observable stochastic game. In particular, we show that our prior experiments [14] constitute a special case of the model here under certain assumptions. • We use a game-theoretic analysis of the competition model to study the possible outcomes from the theoretical perspective. • We use the previous reinforcement learning approach to deriving competition strategies [14] to analyse the competition

User

query

aggregated results

Engine selection

SE

forwarded queries

SE

Result merging Metasearch layer

SE search results

SE

Figure 1: Federated Web search

dynamics empirically in simulation experiments. These experiments model competition between service providers that simultaneously and independently adapt their topic specialisation to improve individual performance (profits) based on the past experience. In conclusions, we draw parallels between theoretical and empirical results, and discuss outlook on future research.

2.

MODEL OF COMPETITION

Our model of a federated Web search environment consists of multiple search engines and a metasearch layer (see Fig. 1). All these components can be independently owned and controlled. The search engines may choose what information on the Web to cover, e.g. everything or only a set of specific topics. Consequently, the quality of service for a query on a given topic may vary between engines. To find the best search engine for a query, users submit their requests to the metasearch layer. In the simplest case, the metasearch layer can be represented by a single metasearch engine. In more complex scenarios, there can be a peer-to-peer network of metasearchers. We will not go here into technical details of how the metasearch layer is organised, but instead focus on its functionality. The metasearchers try to estimate the engines’ service quality for a given query based on a summary information about their content, and distribute the queries to the most relevant engines. At a minimum, a metasearcher should provide ranking of search engines in the order of relevance to a given request. The user can then manually query top-ranked engines. A more elaborate metasearchers (as shown in Fig. 1) can automatically forward user queries to an appropriate number of high-ranked engines and also merge returned results eliminating duplicates. Essentially, we have a search services market where search users are consumers, engines are suppliers, and metasearchers act as brokers. The goal of a search engine is to maximise its profits by deciding what to index or how much to charge users. The engine’s profits depend on the user requests received which, in turn, depend on actions of other engines in the market. We should note here that in reality the content of some specialised search engines may be determined by external factors and cannot be changed for the reasons of competition. For example, the content of a search service for journal articles is determined by the scope of the journal. While the index of such search engines may still change over time, this process may not be related directly to the actions of other market participants or variations in user interests. However, we can still consider these engines as competitors having somewhat restricted competition strategies (in the simplest case, a fixed index content with no changes performed no matter what happens on the market).

Given the complexity of the problem domain, attempting to make our analysis 100% realistic from the very beginning is neither feasible nor reasonable. Thus, we will have to use some simplifying assumptions. Our goal is to select an approach that allows us in principle to factor more realistic details into our models in future. Calculating engine profits. Profit is the difference between revenues generated by a search service and the costs of resources used to provide it. We consider two principal sources of income for search engines: search income (charging users for the service) and advertising income (charging Web publishers for directing users to their Web sites). The search income is calculated as a sum of charges for individual requests: cQ, where c is the price per request, and Q is the number of requests processed. While there are many possible ways for advertising with search engines, they all follow the same reasoning: if a link to the advertiser’s Web site appears in response to a search request, then such request may result in a business transaction between the user and the advertiser. Therefore, advertisers essentially buy queries from search engines. Consequently, the advertising income can similarly be associated with search requests received by the engine as αQ, where α is an average payment by advertisers per request. We assume uniform valuations of queries here for simplicity. The cost of resources used to provide the service can be subdivided into crawling, indexing, and searching. We derive our cost formulas by analysing dependencies between operational requirements and the amount of resources needed. To provide given response time and throughput for a search service, engines can combine two scaling strategies: partitioning and replication [7]. Partitioning reduces the response time for a single query by partitioning the search index between many machines, thus using more resources in parallel to process the query. Replication increases throughput by replicating same portions of the index across many machines, thus utilising more resources to process more requests in parallel. Therefore, the amount of resources (hence, the cost) is proportional to the product of the number of queries processed Q (replication costs) and the number of indexed documents D (partitioning costs). Indexing and crawling are governed by the corresponding index growth and freshness requirements. The index freshness determines the percentage of documents in the index updated per a unit of time. For a given freshness requirement, the number of documents crawled and processed to maintain the index is proportional to the index size D. Scalability of crawlers and indexers is achieved by following the same idea of distributing the load between multiple machines. We assume here, rather optimistically, that the network costs are also proportional to the number of documents transferred. Therefore, the cost of indexing and crawling is a linear combination of the number of new documents C added to the index and the index size D. Putting together income and costs, the engine’s profit is expressed as: U = (α + c)Q − β1 QD − β2 D − β3 C, where Q is the number of queries received in a given time interval, D is the number of documents in the engine’s index, C is the number of new documents added to the index during the given time interval, α and βx are constants. The examples of FAST [18] and Google [2] search engines confirm that our cost of search is not that far from reality. Note, it follows from this formula that for given constants, an engine’s profit per query U/Q has to decrease eventually as D grows and become negative for D > (α + c)/β1 . This effect accords with the intuition that it is more cost-efficient to search smaller indices for each query, and serves to justify our economic framework for analysing search engines behaviour.

Metasearch. We posit a very generic model of what any reasonable metasearch system should do. The discussion here is similar to [10]. The goal of a user is to obtain the information of interest while minimising the costs associated with retrieving this information. We assume that users assign a constant value v to each relevant document. This is, of course, a simplification which we use here for analytical tractability. To understand whether the document is relevant or not, users pay an evaluation cost s. Therefore, the expected value of requesting n results for query q from engine i is Vi (n, q) = n(vPi (n, q) − s) − ci , where Pi (n, q) is the expected precision of the result set and ci is the service price. The goal of the metasearcher is to find the search engine i providing the best request value [maxn Vi (n, q)]. Following [10], we rely on the expected recall-precision curves for estimating the expected precision of a result set Pi (n, q). We assume that search engines produce linearly decreasing recall-precision curves P = p(1 − r), where P is precision, r is recall and p is a query-independent constant. The idea is that to increase recall, users can ask for more results. However, due to imperfect IR algorithms, this also retrieves more irrelevant documents, thus decreasing precision (see also [10]). Expected recall in a set of n results for engine i can be calculated as ri = [nPi (n, q)]/[Di Pi (Di , q)], where Di is the number of documents in the index of engine i, and Di Pi (Di , q) is essentially the expected number of relevant documents indexed by engine i. Substituting ri and Pi into the recallprecision formula, we obtain Pi (n, q) = [pi Di Pi (Di , q)]/[Di Pi (Di , q) + npi ], where pi is a query-independent constant (representing the quality of the search algorithm for engine i). Therefore, we can find the request value of engine i: Vi∗ (q) = max Vi (n, q) = νDi Pi (Di , q) − ci , n

√

2 p where ν = v − s/pi . To provide metasearch for any query q, we use the concept of topics. We assume that there exists a set of T basic topics whose combinations capture the semantics of every document or query. A document is characterised by a vector (dt ) describing contribution of each of the basic topics, dt denotes the weight of topic t in document d. Similarly, a query is represented by a vector (q t ). The weights can be interpreted as implication probabilities using a paradigm of information retrieval (IR) as uncertain inference [23]. According to probabilistic IR [23], the probability that document d P is relevant to query q can be expressed as Pr(rel |d, q) = t dt q t . Then we can calculate the expected precision as follows: 1 X X t t dq , Pi (n, q) = n t d∈Ri (n,q)

where Ri (n, q) is the result set of n documents returned for query q. Let wit denote the average P topic weight in documents indexed by engine i: wit = 1/Di d∈i dt , where by d ∈ i we mean the set of documents indexed by engine i. Then the expected number of relevant documents indexed by engine i can be calculated as X t t wi q . Di Pi (Di , q) = Di t

Let us make the following assumptions. (A1) All queries issued by users are pure queries on a single basic topic: for any query q, 0 q t = 1 and q t = 0 for all t0 6= t. Let q(t) be the pure query

on topic t. (A1) can be understood as the case when users simply pick a topic from the offered ontology rather than providing keywords (Yahoo! and similar Web directories are good examples of this scenario). (A2) All documents are only relevant to a single topic among those indexed by a given engine: for a given engine and document 0 d, dt > 0 and dt = 0 for all topics t, t0 indexed by this engine 0 and t 6= t. We use Dit to denote the number of documents on topic t indexed by engine i with P the ttotal number of indexed documents being equal to Di = t Di . Obviously, real documents cover multiple topics. However, we envisage that in heterogeneous federated search environments, engines will index relatively small numbers of topics compared to the total number of topics available. So, it is unlikely that an engine will cover more than a single topic in a document. Let git be the average topic weight Pfor the documents on topic t indexed by engine i: git = 1/Dit d∈i dt . Since search indices are built using topic-specific (focused) crawlers, g it can be viewed as a characteristic of the crawler’s output. (A3) All search engines have “equally good” crawlers, i.e. if Dit = Djt , then git = gjt for all i, j, t. (A4) All search engines use information retrieval algorithms producing the same recall-precision curves (i.e. pi = pj ∀i, j). While the quality of crawlers and retrieval algorithms may actually vary between engines, (A3) and (A4) state that in this paper we only consider competition based on the index content and pricing parameters. Note, however, our model allows for factoring in such differences in principle. For the given assumptions, wit = git Dit /Di . Substituting it into the formulas for Di Pi (Di , q) and then Vi∗ , we obtain that under (A1)-(A4) the request value of engine i for a query on topic t is Vi∗ (q(t)) = νDit git − ci . We assume that a query on topic t is forwarded to engine i with the highest Vi∗ . If several search engines provide the same value for a topic, one is selected at random (hence, they would split the queries on this topic equally). Note, that so far we have ignored the question of how metasearchers get paid for their services. Possible scenarios include charging users for metasearch (similar to the search income) or charging search engines for inclusion into the metasearch listings (similar to the advertising income). Competition as a stochastic game. The competition proceeds in series of time intervals (stages). Each stage consist of three phases. In the first phase, search engines allocate computing and network resources for processing of user requests and make adjustments to their index contents and request price parameters. In the second phase, users submit search requests to the system and the metasearchers distribute them between the engine(s) based on their request values Vi∗ in the current time interval. In the third phase, the profit of participating search engines in the current time interval is calculated based on their resource allocations and the requests received. The performance feedback received is used by the search engines to decide how to allocate resources and adjust service parameters in the next time interval. The decision making is simultaneous and independent. Also, since search engines cannot have unlimited crawling resources, they can only make incremental adjustments to their indices. To simplify the subsequent analysis, we adopt the currently predominant model of Web search where all engines are free for users (i.e. request price ci = 0 for all engines i and the profits are generated from advertising). The competition process can be conveniently modelled as a partially observable stochastic game (POSG). A stochastic game (SG) [9] is a tuple hI, S, s0 , (Ai ), Z, (ui )i, where I is the number of players in the game, S is a set of game states, s0 is the initial state, Ai

is a set of actions available to player i. If ai is the action chosen by player i, then a = (ai )Ii=1 is a joint action (action profile) and A is a set of the possible joint actions in the game. Z is a state transition function: S × A × S → [0, 1]; and ui (s, a) is a utility (or payoff) function of player i: S × A → R. In each period of the game, players simultaneously choose their actions, the state of the game changes according to the transition function, and the players receive payoffs. In our case, the players are search engines, the state of the game at a given stage is determined by the state of engines’ indices Di = (Dit )t (i.e. the number of documents indexed on each topic) and by the number of requests Qt0 submitted by users on each topic t: ît )t , (Q ˆ ti )t i s = h(Di ), (Qt0 )i. A player’s action ai is a tuple h(C which defines the resource allocations and index adjustments at the ît are available for given stage. The following index adjustments C each topic t: increase or decrease the number of indexed documents Dit by 1 (“Grow”, “Shrink”); and leave it unchanged (“Same”). Resource allocations depend on the number of user queries that ˆ ti be the number of queries will be processed by the engine. Let Q expected by engine i on topic t at the given stage. Search engines cannot know in advance how many queries they will receive, since it depends on the users as well as on actions of competitors. We assume that if an engine allocates more resources than it gets queries, then the cost of the idle resources is wasted. If, on the contrary, a search engine receives more queries than expected (i.e. more queries than it can process), the excess queries are simply rejected and the search engine does not benefit from them. Such a model reflects the natural inertia in resource allocation: we cannot buy or sell computing resources “on the fly” for processing of each individual user query. Thus, the utility of engine i is calculated as ˆ i Di − β 2 Di − β 3 C i , ˆ i ) − β1 Q ui = α min(Qi , Q P ˆt î = where Q t Qi is the number P of queries expected by engine i at the given stage, and Ci = t Cit is the number of documents added to the index during the given time interval, where Cit = 1 if ît = “Grow”, else Cit = 0. C P Qi = t Qti is the number of queries actually received by engine i, where Qti is the number of queries received on topic t. Using our metasearch rule above, we obtain that for ci = 0 and assumptions (A1)-(A4) a query on topic t is forwarded to the search engine indexing the largest number of documents on this topic (since the rank Vi∗ (q(t)) is determined by the number of documents Dit ). t t t t Consequently, Qti = 0 ift ∃j, Di < D j , and Qi = Q0 /|B| if t i ∈ B, where B = b : Db = maxj Dj is the set of the highestranked search engines for topic t. That is, the search engine does not receive any queries, if it indexes less documents than competitors on this topic, and the engines indexing the largest number of documents on the topic split queries equally between them (due to random selection among the highest-ranked engines, see the metasearch model above). Long-term payoff. The goal of each player in the game is to maximise its long-term payoff. There are several criteria for evaluating a sequence of payoffs received by a player in SG. In average payoff SG, the long-term payoff of a player is evaluated as Pan averk age payoff over all game stages played: Ui (K) = 1/K K k=1 ui , k where ui is the payoff of player i at stage k. In the next Section, we consider the competition over infinitely many stages, when the long-term payoff is evaluated as limK→∞ Ui (K) and is called limit of means. Players’ strategies and observations. A player’s strategy (policy) in SG is a function Λi : Hi × Ai → [0, 1] that maps histories Hi of the player’s observations of the game state and actions of other players onto a probability distribution over player’s actions

Ai . In our POSG, while each engine can fully observe the state of its own index, it is unreasonable to assume that it can observe the exact index state and actions of its competitors. Instead, observations of the opponents’ states reflect the relative positions of other engines in the metasearcher’s rankings, which indirectly gives the player information about the state of their indices. In particular, the following 3 observations are available for each topic t: Opponents winning – there are players ranked higher for topic t than our search engine (i.e. they index more documents on the topic than we do); Opponents tying – there are players having the same rank as our engine, but no one has a higher rank for topic t (opponents index the same and smaller numbers of documents on the topic); and Opponents losing – the rank of our search engine for topic t is higher than opponents (opponents index less documents on the topic than we do). One may ask how a player can obtain information about the rankings of its opponents. This can be done by sending a query on the topic of interest to a metasearcher (as a search user) and requesting a ranked list of search engines for the query. We also assume that players can obtain from the metasearch layer statistics on the previously submitted user queries. This data are used in calculation of the expected number of queries for each topic. In particular, the number of queries on topic t expected by engine i at stage k equals to the number of queries actually submitted by users at the previous ˆ ti (k) = Qt0 (k − 1). stage k − 1, i.e. if Dit (k) > 0, then Q

3. GAME-THEORETIC ANALYSIS In this section, we investigate whether service providers can use theoretical (deductive) reasoning to independently derive strategies that maximise their individual profits. That is, if we were to rely on decentralised control in our Web search economy, what the resulting outcome could be from the theoretical point of view. The optimal behaviour for a player in multi-player games is in general opponent-dependent. A fundamental problem is that a player has no direct control over the behaviour of its opponents. However, it can expect that the other players in the game would attempt to maximise their individual payoffs, i.e. are rational. Therefore, the strategy of a rational player should be a best response to its expectations for the strategies of other players, and it is optimal only if the player holds the correct expectations. Nash equilibria [16] capture situations in which each player holds the correct expectations about the other players’ behaviour and acts rationally. A combination of players’ strategies is called a Nash equilibrium if none of the players can benefit by unilaterally deviating. SGs have a rich structure of equilibrium behaviour that may be interpreted in terms of a “social norm” [16]. The idea is that players can sustain mutually desirable outcomes, if their strategies involve “punishing” any player whose behaviour is undesirable. This is regulated by the folk theorems3 from game theory. Following [9], a vector (xi ) is called a feasible long-term payoff profile in a limit of means SG with initial state s0 , if there is a combination of players’ strategies (Λi ), such that xi = u î (s0 , (Λi )) = PK 1 î (s0 , (Λi ), k), where u î (s0 , (Λi ), k) is the exlimK→∞ K k=1 u pected payoff of player i at stage k for the given initial state and players’ strategies. Let Λ−i denote a strategy profile of all players except i. We define the minimax payoff µi (s0 ) of player i in a limit of means SG with initial state s0 as µi (s0 ) = min max u î (s0 , (Λi , Λ−i )). Λ−i

Λi

3 Term used in the game theory community, since their originator is apparently unknown.

Essentially, the strategy profile Λ−i corresponding to the minimax is the most severe punishment that other players can inflict on i for the given initial state. A payoff profile (xi ) is strictly enforceable in SG with initial state s0 , if it is feasible and xi > µi (s0 ) for all i. Intuitively, enforceable profiles are the outcomes where each player can be punished, and so threats of punishment can sustain the outcome as an equilibrium. To play punishing strategies all players should be able to detect deviations by their opponents. Players in our Web search game cannot observe the exact actions and payoffs of their opponents. However, the punishing strategies can still be implemented due to the following important properties of our game: not all deviations by a single player affect payoffs of its opponents; any deviation by a single player that affects the payoff of at least one other player in the game can be detected by all players. To affect payoffs of other players, one needs to affect the number of requests received. This requires changing the relative rankings of players by metasearchers, which can be detected by all players in the game, since these data is a part of the players observations. Let (Λi ) be a combination of players’ strategies, such that no player can profitably deviate from its strategy without changing the sequence of relative rankings of players by the metasearch layer at some stage of the game. We call the long-term payoff profile for the given strategy combination (Λi ) detectable, because the players can detect any rational (profitable) deviations from such profile. The difficulty in SGs is that, unlike in repeated games, deviations not only alter current payoffs, but may take the game to states in which punishments are ineffective (recall that minimax depends on the initial state). Also, the set of feasible payoff profiles may change (e.g. it may not be possible to achieve the desired payoff after a punishment). P ROPOSITION 1. The set of all feasible long-term payoff profiles in a limit of means Web search game is independent of the initial state. Sketch of a proof: For any 2 states s1 and s2 of the game, characterised by the corresponding contents of the engines’ indices, we can always construct a strategy profile that changes the current game state from s1 to s2 . Indeed, since engines are independent and unrestricted in controlling their index contents, they can always increase or reduce the number of indexed documents for each topic appropriately. Let (xi ) be a feasible long-term payoff profile in the game with initial state s1 realised for some strategy profile (Λi ). Let (Λ0i ) be a strategy profile in the game with initial state s2 such that it first changes the game state from s2 to s1 and then follows (Λi ). Let ki be the period in which (Λ0i ) starts following (Λi ). Then for all i, u î (s2 , (Λ0i )) = u î (s1 , (Λi )) − Pk i 1 î (s1 , (Λi ), k) + limK→∞ K k=1 u Pk i 1 limK→∞ K î (s2 , (Λ0i ), k) = u î (s1 , (Λi )). Therefore, payk=1 u off profile (xi ) is also feasible in the game with initial state s2 . The same reasoning holds in the opposite direction. Thus, the sets of feasible long-term payoff profiles are the same for any s1 and s2 . P ROPOSITION 2. The minimax payoff in a limit of means Web search game is independent of the initial state for all players. Sketch of a proof: Let the strategy profile Λ−i of all engines except i be as follows: each engine j 6= i indexes Djt = α/β1 documents for each topic t. Then maxΛi u î (s0 , (Λi , Λ−i )) = 0. That is, the best engine i can do is to index nothing and receive payoff of 0. Indeed, to receive queries, i needs to index more documents than opponents: Dit > Djt (see the metasearch rule). However, it follows from our performance formula that for Di > α/β1 the

payoff is negative, no matter how many queries the engine receives. Therefore, from any state the payoff of a given engine can be held down to 0 by its opponents. Since an engine can always achieve the 0 payoff, this makes 0 the minimax payoff in our game. C OROLLARY 1. Every feasible long-term payoff profile where each player has a positive payoff in a limit of means Web search game is strictly enforceable for any initial state. We omit more detailed proofs due to the lack of space. These Propositions are sufficient to apply the folk theorem for stochastic games from [9] with the following result: P ROPOSITION 3. Any detectable long-term payoff profile where each player has a positive payoff is an equilibrium profile in a limit of means Web search game. That is, any detectable outcome in which all engines are profitable can be sustained as an equilibrium in our Web search game. Given the multiplicity of equilibria, players need to reach a coordinated equilibrium outcome to play optimally. There is a large body of research on equilibrium selection in game theory using various principles and criteria [12]. However, a generic assumption that players are payoff maximisers is not sufficient to select a unique outcome. The players need to explicitly adopt more specific concepts of rationality (e.g. payoff dominance, risk dominance), and these beliefs need to be consistent among players. Second, it may ultimately require characterising all Nash equilibria of a game. The task is NP-hard even for simpler matrix games and even given complete information about the game [8]. NP-hardness results and the fact that players may not have complete information about the game and other players’ beliefs demonstrate that deductive reasoning alone is not sufficient for service providers to derive the optimal competition strategies. These considerations lead us to the idea of “bounded rationality” [19]. Bounded rationality assumes that decision makers are unable to act optimally a priori due to limited knowledge of the environment and opponents, and limited computational resources. Instead, they have to rely on inductive reasoning, deriving their expectations for the opponents’ behaviour not from theoretical analysis of the game, but from repeated interaction. The goal here is to iteratively derive a strategy that performs well against given opponents. This naturally leads to the idea of multi-agent reinforcement learning (MARL) [20].

4. LEARNING TO COMPETE Since SGs can be viewed as a generalisation of Markov Decision Processes (MDPs) [3] to multiple controllers, many learning methods in SGs concentrated on extending traditional RL algorithms (like Q-learning [3]) to MA settings. However, while in MDPs the RL problem is well-defined (since MDPs have optimal policies), in SGs the optimal policy depends on the strategies of opponents (as already discussed). One approach is to explicitly learn an equilibrium of the game: at each learning iteration, the optimal action is chosen by calculating an equilibrium of the game based on the currently learned game model. The need to select a unique equilibrium, however, makes this approach applicable only to special cases of games, or requires an “oracle” to coordinate learners (which begs the question of why to use learning in this case). Also, calculating equilibria may be problematic in POSGs. Another approach is to learn the best response for the given opponents (which is more aligned with the idea of bounded rationality). An example of the best-response approach is opponent modelling [5]. A learner builds models of its opponents based on past

observations of their behaviour in the game and then computes the best-response strategy to the obtained models. In [14], we proposed a version of best-response learning using the GAPS RL algorithm in a search engine competition scenario. In GAPS [17], the learner plays a parameterised strategy represented by a non-deterministic finite state automaton (FSA). The FSA inputs are observations of the game, the FSA outputs are player’s actions. Policy parameters are the probabilities of FSA outputs and state transitions. GAPS implements stochastic gradient ascent in the space of policy parameters. After each learning trial, parameters of the policy are updated by following the payoff gradient. Essentially, GAPS searches among possible policies for the one that is, in general, locally optimal, i.e. that yields a higher payoff than other “close” policies. GAPS has a number of advantages important in our POSG. Unlike model-based RL algorithms, GAPS does not attempt to build a model of the game or the opponents from the interaction experience and then to derive the optimal behaviour policy using the obtained model. This reduces the information needs of the algorithm allowing it to cope with partial observability and to scale well with the number of opponents. GAPS also scales well to multiple topics by viewing decision-making as a game with factored actions (where action components correspond to topics). The action space of a player in such games is the product of factor spaces for each action component. GAPS, however, reduces the learning complexity: rather than learning in the product action space, separate cooperating GAPS learners can be used for each action component. Such distributed learning is equivalent to learning in the product action space [17]. Finally, the policies learned by GAPS can condition their outputs on past observations (“memorised” in the FSA state). This can compensate for the lack of full observations in POSGs, thus improving policy’s performance. Moreover, the past-dependent behaviour is crucial for implementing “punishing” strategies in SGs. As shown in [14], these properties allow GAPS learners to compete successfully against various opponents with fixed strategies in the Web search game. However, we can expect that in real life multiple search providers will be adapting at the same time. The problem with reinforcement learners in this case is that the learners’ environment becomes non-stationary: the best response of a player changes as other players learn simultaneously. In this paper, we are interested in whether such decentralised adaptive control can support effective topical specialisation in federated Web search environments.

5.

TOPICAL SPECIALISATION IN SELFPLAY

Assuming that opponents’ strategies change in arbitrary ways is not practical. In this paper, we focus on the case of self-play, i.e. when all players use the same learning algorithm, GAPS. Self-play is an important step towards more generic analysis. Ignoring it makes a naive assumption that opponents are inferior. Theoretical perspective. Every strict Nash equilibrium (i.e. an equilibrium where a deviating player is strictly worse) is a local maximum for policy gradient ascent [17]. Notice that every equilibrium corresponding to a detectable payoff profile with all players receiving positive payoffs is a strict equilibrium, because deviating players can always be punished (i.e. held down to the 0 minimax) for long enough to make them strictly worse. Therefore, every equilibrium in Proposition 3 is a convergence point for the GAPS algorithm. (However, it does not necessarily imply that every local maxima for the gradient ascent is an equilibrium or that multi-agent learning will actually converge).

90

80

80 Long-term average payoff

Long-term average payoff

90

70 60 50 40 30 Engine 1 Engine 2 Fixed Engine 1 Challenger

20 10 0 0

100

200

300 400 0 100 Learning trials (thousands)

200

50 40 30 20 10 0

300

Since players can only have a finite history memory, in an equilibrium they should exhibit cyclic behaviour, repeatedly playing some action sequence. Taking into account that changing index contents involves crawling costs, the most cost-efficient equilibrium behaviour would be keeping index unchanged once the equilibrium state is reached. Such outcomes can be characterised qualitatively by the relative ranks rit of each engine i for each topic t in the metasearch layer. In an equilibrium, a rational player will not index any documents for topics for which it does not get queries (i.e. is not the highest-ranked engine). This is because indexing such documents consumes resources, but does not bring profit (queries). On the other topics, a player will index just enough documents to remain (among) the highest-ranked engine(s) for those topics. This is because indexing less documents for the same number of queries increases the profit (see the performance formula). Let rit = 1 mean that i receivesPqueries on topic t and rit = 0 otherwise. Then, Qti = Qt0 rit / j rjt and Dit = rit Dt , where D t is the minimum sufficient number P documents to be the top-ranked engine for topic t. Note, when j rjt = 0 (i.e. nobody indexes topic t), we presupP pose that rit / j rjt = 0/0 = 0 in this notation. Consider a very simple case of the game with only 2 players and 2 topics. Assume that user interests do not change over time and are uniform: Q10 = Q20 = Q0 . Also, assume β2 = β3 = 0. Suppose players would like to sustain some outcome as an equilibrium in SG by using punishing strategies. Since the user interests do not ˆ ti = Qti for all i, t (because each player change, in an equilibrium Q knows how many queries users will submit and holds the correct expectations for actions of opponents). Thus, the payoff of player i for the given assumptions is # " #" X t t X t X t ri D . ri / rj α − β 1 Ui = Q 0 j

60

-10

Figure 2: 2 engines: self-play (left), challenging Engine 1 (right).

t

70

t

Let us now characterise possible equilibria payoff profiles (Ui ) quantitatively. First, for an outcome to be an equilibrium, Ui > 0 should hold for all i. Second, for any outcome characterised by some rankings (rit )i,t , the payoff of all players is greater for smaller Dt . Hence for given rankings, rational players should prefer the minimum possible D t . Comparison of payoff profiles for different ranking combinations (rit )i,t shows that the cases when each engine specialises in a different topic (e.g. r11 = r22 = 1, r12 = r21 = 0) yield higher payoffs than when engines compete head-on. Full specialisation results in the highest U1 + U2 ; that is, it maximises game-theoretic social welfare. Also, for β1 /α ≥ 0.25, such out-

0

100

200

300 400 0 100 Learning trials (thousands)

200

300

Figure 3: 10 engines: self-play (left), challenging (right).

comes yield the maximum individual payoffs Ui for every player. Therefore, if engines were to agree on which outcome to sustain, topical specialisation is a justified choice for payoff maximisers. Empirical results. We used a simulator of the Web search game to analyse the engines’ behaviour empirically. We trained 2 learners, each using a 3-state GAPS policy, in self-play in a scenario with 2 basic topics. However, the experimental setup was more realistic: we used non-zero cost coefficients β2 and β3 . Also, the number of user queries was changing between game stages and was different for different topics. To realistically simulate this, we used HTTP logs from a Web proxy of a large ISP with real requests to over 40 existing search engines. We associated topics with search terms in the logs. To simulate queries for T topics, we extracted the T most popular terms from the logs. The number of queries generated on topic t for a given period (a day) was equal to the number of queries with term t in the logs belonging to this day. The learning in self-play converged to relatively stable strategies. The players split the query market: each engine specialised on a different topic. In the next experiment, we fixed the strategy of Engine 1 after self-play and trained another 3-state GAPS learner (Challenger) against it. The idea is that we want to see what is the best response against the strategy which Engine 1 learned in self-play. The same experiment was then repeated with fixing the strategy of Engine 2 and also challenging it. Fig. 2 shows how long-term payoffs of search engines were changing during self-play (left) and challenging the strategy of Engine 1 (right). At the end of self-play, Engine 1 learned to specialise on the more popular topic, while Engine 2 picked the less popular topic. The Challenger could not do any better than to take the less popular topic (i.e. to replicate the strategy of Engine 2) achieving the same profit as Engine 2 in self-play. That is, the best response against Engine 1 the Challenger found was the strategy of Engine 2. This indicates that Engine 1 did actually implement a punishing strategy which sustained the desired equilibrium (as discussed in Section 3). The picture was analogous when we fixed the strategy of Engine 2 after self-play and trained a Challenger against it. To analyse how the system size and growing numbers of basic topics affect the engines’ behaviour, we simulated 10 learners competing in a game with 10 basic topics. Analysis of the engines’ behaviour showed that, similarly to the case of 2 learners and 2 topics, they split the query market with different engines specialising in different topics. During the challenging phase, we fixed strategies of the first 5 engines after self-play, and trained 5 challengers against them. Fig. 3 shows engines’ learning curves during self-play (left) and challenging (right). Not only the fixed engines (solid lines) sustained their topic specialisation, but also the chal-

lengers (dotted lines) split the rest of the query market between themselves, thus replicating the outcome of the self-play phase. In addition, we performed experiments with varying numbers of engines and topics in the system (5 − 10 engines, 10 − 50 topics) with the same qualitative outcome: players implemented some form of topical specialisation.

6. CONCLUSIONS Organising large-scale Web information retrieval systems into hierarchies of topic-specific search resources can improve both the quality of results and the efficient use of computing resources. A promising way to build such systems involves federations of topicspecific search engines. An important question, however, is whether such federated environments can effectively manage topical specialisation of search engines under decentralised control. How can independent search providers choose what topics to index? What happens to topical specialisation of participating search engines when each search service provider makes individual and likely selfinterested decisions on its service parameters? We investigated these questions in a computational economics framework, where search engines compete with each other for user queries to maximise individual profits. We provided a formalism modelling the competition as a stochastic game, where the game players are search engines. Our game-theoretic analysis suggested that search engines can sustain mutually beneficial outcomes, where they specialise: different engines index different topics thus targeting different users (i.e. they partition the market of user queries). Empirical results with engines using a multi-agent reinforcement learning (MARL) approach to deriving competition strategies showed that: • learners implement punishing strategies predicted by the theoretical analysis to sustain mutually desirable equilibrium outcomes; • topic specialisation between engines has emerged in our experiments as a result of multiple decision makers simultaneously striving to improve their individual profits; • the MARL approach also works in noisy settings (e.g. when user interests are non-uniform and may vary over time). These results demonstrate the self-organisation potential of the economically-driven decentralised control, and further support the vision for the federated heterogeneous Web search environments. This is clearly just an initial study into the organisational dynamics of federated Web search systems. One future direction will be to relax the assumptions used in our analysis and experiments. In particular, experiments with real documents and using some existing metasearch algorithms should allow us to address the assumptions about the topical content of queries and documents. It would also be interesting to experiment with heterogeneous populations of search engines, where different engines can have different quality of IR algorithms and Web crawlers. Another fruitful avenue for future research may be extending this work to other aspects of decentralised Web information management systems (such as adaptive network configuration and routing for metasearch layers in peer-to-peer information retrieval environments).

7. ACKNOWLEDGEMENTS This research was supported by grant SFI/01/F.1/C015 from Science Foundation Ireland, and grant N00014-03-1-0274 from the US Office of Naval Research.

8.

REFERENCES

[1] E. Adar and B. Huberman. Free riding on Gnutella. First Monday, 5(10), 2000. [2] L. Barroso, J. Dean, and U. Ho¨ lzle. Web search for a planet: The Google cluster architecture. IEEE Micro, 23(2), 2003. [3] D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 1995. [4] J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In Proc. of the 18th Annual Intl. ACM SIGIR Conf., 1995. [5] D. Carmel and S. Markovitch. Learning models of intelligent agents. In Proc. of the 13th National Conf. on AI, 1996. [6] S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific Web resource discovery. In Proc. of the 8th World Wide Web Conf., 1999. [7] A. Chowdhury and G. Pass. Operational requirements for scalable search systems. In Proc. of the 12th Intl. ACM CIKM Conf., 2003. [8] V. Conitzer and T. Sandholm. Complexity results about Nash equilibria. In Proc. of the 18th Intl. Joint Conf. on AI, 2003. [9] P. K. Dutta. A folk theorem for stochastic games. Journal of Economic Theory, 66(1), 1995. [10] N. Fuhr. A decision-theoretic approach to database selection in networked IR. ACM Trans. on Information Systems, 17(3), 1999. [11] L. Gravano and H. Garcia-Molina. GlOSS: Text-source discovery over the Internet. ACM Trans. on Database Systems, 24(2), 1999. [12] J. Harsanyi and R. Selton. A General Theory of Equilibrium Selection in Games. The MIT Press, 1988. [13] J. O. Kephart, J. E. Hanson, D. W. Levine, B. N. Grosof, J. Sairamesh, R. B. Segal, and S. R. White. Dynamics of an information-filtering economy. In Proc. of the 2nd Intl. CIA Workshop, volume 1435 of Lecture Notes in Computer Science. Springer-Verlag, 1998. [14] R. Khoussainov and N. Kushmerick. Automated index management for distributed Web search. In Proc. of the 12th Intl. ACM CIKM Conf., 2003. [15] A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Building domain-specific search engines with machine learning techniques. In Proc. of the AAAI-99 Spring Symposium, 1999. [16] M. Osborne and A. Rubinstein. A Course in Game Theory. The MIT Press, 1999. [17] L. Peshkin, N. Meuleau, K.-E. Kim, and L.Kaelbling. Learning to cooperate via policy search. In Proc. of the 16th Intl. Conf. on Uncertainty in AI, 2000. [18] K. Risvik and R. Michelsen. Search engines and Web dynamics. Computer Networks, 39(3), 2002. [19] A. Rubinstein. Modelling Bounded Rationality. The MIT Press, 1997. [20] Y. Shoham, T. Grenager, and R. Powers. Multi-agent reinforcement learning: A critical survey. Technical report, Stanford University, 2003. [21] A. Sugiura and O. Etzioni. Query routing for Web search engines: architecture and experiments. In Proc. of the 9th World Wide Web Conf., 2000. [22] H. Tirri. Search in vain: Challenges for Internet search. IEEE Computer, 36(1), 2003. [23] C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979.