A Signaling Game Approach to Databases Querying and Interaction

4 downloads 100 Views 186KB Size Report
Mar 13, 2016 - arXiv:1603.04068v1 [cs.DB] 13 Mar 2016. A Signaling Game Approach to Databases Querying and. Interaction. Ben McCamish. Oregon State ...
arXiv:1603.04068v1 [cs.DB] 13 Mar 2016

A Signaling Game Approach to Databases Querying and Interaction Ben McCamish

Vinod Ramaswamy

Oregon State University

University of Colorado Boulder

[email protected] Arash Termehchy

[email protected] University of Colorado Boulder

[email protected]

[email protected]

ABSTRACT 9 As most database users cannot precisely express their information needs, it is challenging for database querying and exploration interfaces to understand them. We propose a novel formal framework for representing and understanding information needs in database querying and exploration. Our framework considers querying as a collaboration between the user and the database system to establish a mutual language for representing information needs. We formalize this collaboration as a signaling game, where each mutual language is an equilibrium for the game. A query interface is more effective if it establishes a less ambiguous mutual language faster. We discuss some equilibria, strategies, and the convergence in this game. In particular, we propose a reinforcement learning mechanism and analyze it within our framework. We prove that this adaptation mechanism for the query interface improves the effectiveness of answering queries stochastically speaking, and converges almost surely. Most importantly, we show that the proposed learning rule is robust to the choice of the metric/reward by the database.

1.

Behrouz Touri

Oregon State University

INTRODUCTION

Most users do not know the structure or content of databases and cannot precisely express their queries [18, 8, 13, 17]. Hence, it is challenging for database query interfaces to understand and satisfy users’ information needs. Database researchers have proposed methods and systems to help users specify their queries more precisely and database query interfaces understand users’ intents more accurately [26, 17, 8, 28, 18, 15, 5]. In particular, the database community has deeply investigated some problems that appear in the context of database usability [37, 14, 20, 13, 2, 6, 29]. Current models mainly focus on improving user satisfaction for a single information need. Given a user’s information need e, the database system estimates e by various methods including showing potential results to the user and collecting her feedback [22, 10, 6, 29, 35], asking questions from her [2], or suggesting potential queries to her [19]. Nevertheless, many users may explore a database to find answers for various queries and information needs over a rather long period of time. Further, the user may seek the answers to the same query more than once. For example, a scientist may use a reference genetic database to find answers to many queries, some of which may repeat multiple times over the life time of a research project. For these users, database querying is an inherently interactive and continuing

process. This setting extends the problem of answering a single information need in two aspects. First, the database system can improve its understanding of how the user expresses her intents progressively over the course of many potentially repeating queries. Second, the user may leverage her experience from previous interactions with the database to formulate her future queries. As the user submits queries and inspect or consume their results, she may gain a better understanding of the database structure and content, which may impact how she formulates queries in future. Researchers have observed a similar behavior in searching text documents [23, 24]. For example, to find the answers for a particular information need, the user may submit some initial and underspecified query, observes its results, and reformulates it according to his observations. After the user finds a SQL query that effectively expresses a frequent intent, she may store the SQL query in a file and call it later to avoid or reduce the painful burden of query formulation in future. Ideally, we would like the user and database query interface to develop gradually some degree of mutual understanding over the course of several queries and interactions: the query interface should better understand how the user expresses his intents and the user may get more familiar with the structure or content of the database. Of course, the user and database system would like to establish a perfect or near-perfect mutual understanding, where the database system returns all or a majority of the desired answers to all or most user queries. An important and interesting question is if there are inherent limits to establishing these mutual understandings. In other words, one would like to know the degrees to which these mutual understandings are possible. One would also like explore the characteristics information needs, queries, and behavior of the query interface in such limits. Moreover, it is useful from the practical point of view to find the methods and strategies that the query interface can adapt to help establishing more effective mutual understandings in rather small amount of time. To answer the aforementioned questions, one has to precisely define the notions of collaboration and mutual understanding between the user and database query interface. In this paper, we propose a novel framework that formalizes the collaboration between user and query interface over the course of many interactions. Our framework models this col-

laboration as a cooperative game between two active and potentially rational agents: the user and query interface. The common goal of the user and query interface is to reach a mutual understanding on expressing information needs in form of queries. The user informs query interface of her intents by submitting queries. The query interface returns some results for the query. Both players receive some reward based on the degree by which the returned answers satisfy the information need behind the query. We use standard effectiveness measures from database system and information retrieval domains to measure the degree of information need satisfaction [27]. The equilibria and stable states of game model the inherent limits in establishing a mutual understanding between the user and query interface. We further explore the strategies that query interface may adapt to improve user satisfaction. We believe that this framework naturally models the long term interactions between the user and database management system and provides the basis for deep theoretical investigations of the problem. More specifically, we make the following contributions in this paper. • We model the long term interaction between the user and query interface as a particular type of cooperative games called signaling game [9, 30]. The user’s strategy in the game is a stochastic mapping from her intents to queries, which reflects the user’s decision in choosing a query to express an information need, i.e., intent. The query interface strategy is a stochastic mapping from queries to results. After each interaction, the database may update its strategy based on the feedback it receives from the user. Since users’ degree of rationality is not generally known, we explore the properties of game for both cases where the user strategy is modified and remains unchanged. • We analyze the equilibria of the game and show that the game has some Nash and strict Nash equilibria. In particular, the game has both Nash and strict Nash equilibria where the user does not necessarily know all queries required to express her information needs. • We propose a reinforcement learning rule for an arbitrary effectiveness measure that updates the database strategy. We prove that this learning strategy for the query interface improves the effectiveness of answering queries stochastically speaking, and converges almost surely. Our proof shows that the proposed learning rule is robust to the choice of the effectiveness measure which is a highly desirable property. This adaptation algorithm can be efficiently implemented when the total number of submitted queries are of a moderate size.

2.

RELATED WORK

Signaling games model communications between two or more agents and have been widely used in economics, sociology, biology, and linguistics [21, 9, 30, 11]. Generally speaking, in a signaling game a player observes the current state of the world and informs the other player(s) by sending a signal. The other player interprets the signal and makes a decision and/or performs an action that affect the payoff of

both players. A signaling game may not be cooperative in which the interests of players do not coincide [9]. Our framework extends a particular category of signaling games called language games [36, 30, 11] and is closely related to learning in signaling games [16, 34]. These games have been used to model the evolution of a population language in a shared environment. In a language game, the strategy of each player is a stochastic mapping between a set of signals and a set of states. Each player observes its internal state, picks a signal according its strategy, and sends the signal to inform other player(s) about its state. If the other player(s) interpret the correct state from the sent signal, the communication is successful and both players will be rewarded. Our framework, however, differs from language games in several fundamental aspects. First, in a language game every player signals, only one of our agents sends signals. Second, language games model states as an unstructured set of objects. However, each user’s intent in our framework is a set of tuples or entities and different intents may intersect. Third, we use widely used similarity functions between the desired and returned answers for a query to measure the degree of success of answering a query. Fourth, the signals in language games do not posses any particular meaning and can be assigned to every state. A database query, however, restricts its possible answers. Finally, there is not any work on language games on analyzing the dynamics of reinforcement learning where players learn in different time scales. Game theoretic approaches have been used in various areas of computer science, such as distributed system, security, and data mining [3, 32, 25]. In particular, researchers have recently applied game theoretic approaches to model the actions taken by users and document retrieval systems in a single session [23]. They propose a framework to find out whether the user likes to continue exploring the current topic or move to another topic. We, however, explore the development of common information need representations of intents between the user and query interface. We also investigate the querying and interactions that may span over multiple queries and sessions. Moreover, we analyze equilibria of the game and convergence rates of some strategy adaptation methods for database system. Finally, we focus on structured rather than unstructured data. Avestani et al. have used signaling games to create a shared lexicon between multiple autonomous systems [4]. We, however, focus on modeling users’ information needs and development of mutual understanding between users and database query interfaces in expressing intents in form of queries. Moreover, as opposed to the autonomous systems, query interfaces and user may update their information about the interaction in different time scales. We also provide a rigorous analysis of the game equilibria and propose novel strategy adaptation mechanisms for the database system. We have proposed the possibility of using signaling games to model the interaction between database users and query interfaces and provided some initial results [33]. The current paper significantly develops our frameworks and extends our results for practical settings. First, our previous work considers a narrow set of queries that do not generally appear in practice. It assumes that the desired answer for each query

is a single tuple. In this paper, we assume that the answers to a query are set or ranked lists of tuples. Second, our previous model does not address the cases where a returned tuple partially matches a desired one. Similarly, it does not handle the cases where returned lists contain a subset of the desired answers. We, however, measure the amount of user’s satisfaction from partially matched results using widely used metrics [27]. We show that the ability to return results that partially match the desired answers of a query significantly impacts the equilibria of the game and the algorithms for updating the strategy of the database system. Third, we did not provide any analysis of the game equilibria in [33].

3.

BACKGROUND

Let Attr be a set of symbols that contains the names of attributes [1]. Each relation R is a finite subset of Attr. The arity of R is the number of attributes in R. Let D be a countably infinite domain of values, e.g., string. Each member of D is called a constant. A relation instance IR of relation R with arity k is a finite subset of Dk . Each member of IR is called a tuple. We call the set of all constants that appear in IR its active domain and denote it as adom(IR ). Database schema R is a non-empty finite set of relations. A database instance of R is a mapping IR whose domain is R and IR (R) is a relation instance of R for all R ∈ R. The definition of active domain naturally extends to database instance I. We denote the active domain of I as adom(I). An atom is a formula in the form of R(v1 , . . . , vn ) where R is a relation, n is the arity of R, and each vi , 1 ≤ i ≤ n, is a variable or a constant. A literal is an atom, or the negation of an atom. A rule has the form of ans(v) ← L1 (v1 ), · · · , Lm (vm ). where Li , 1 ≤ i ≤ m, are literals and ans(v) is an atom, which is called the head of the rule. Each variable in v occurs in at least one of v1 , · · · , vm . Each rule has a finite set of literals. A query is a (finite) set of rules that share the same head. A query is over schema R if and only if all its non-head literals are relations or the negation of relations in R. We show the results of evaluating query s over database instance IR as s(IR ). To simplify our analysis and without loss of generality, we consider only minimal queries in this paper [1]. Our results extend for non-minimal queries. For a positive integer m ≥ 1, we denote [m] := {1, . . . , m}. For a vector u ∈ Rm , we denote the ith entry of u by ui . Similarly, we denote the (i, j)th entry of an m × n matrix P (i ∈ [m], j ∈ [n]) by Pij . We also say that P is a rowstochastic matrix (or simply a stochastic if it is nonPmatrix) n negative (i.e. Pij ≥ 0 for all i, j) and j=1 Pij = 1 for all i ∈ [m]. We denote the set of all m × n stochastic matrices by Lmn . For an event A of a probability space Ω, we use 1A for the indicator function on the set A, i.e. 1A (ω) = 1 of ω ∈ A and 1A (ω) = 0 if ω 6∈ A.

4.

SIGNALING GAME MODEL

4.1 Intent, Query, and Result Paradigm 4.1.1 Intent

Similar to other works on database usability, we focus on the challenges of formulating queries over a database due to the user’s lack of knowledge about the structure and/content of the database. We assume that the database system provides a sufficiently expressive query language for the user to express her intents. Thus, given that the user is able to express her intents in form of queries, the database system will return user’s desired tuples. Hence, we define an intent as a query in a fixed query language. The possible queries in a standard query language, e.g., CQ or UCQ [1], over a database instance is infinite. Nevertheless, in practice a user has only a finite number of intents over a database instance in a finite period of time. Hence, we assume the number of intents for a particular user is finite. We index each intent over database instance I by 1 ≤ i ≤ m. We assume that the set of intents P have a prior probability π, i.e. π ∈ Rm , πi ≥ 0 m for all i, and i=1 πi = 1. In this case πi is the probability that user has intent ei in mind. Without loss of generality, we can assume that πi > 0 for all i ∈ [m], otherwise, we can restrict our analysis on the set {ei | πi > 0}.

4.1.2 Query Because users do not usually have complete information about the schema and/or content of the database, they may submit queries that are not generally the same as their intents [18, 7, 19, 13, 2], having intent e in mind. Since the user is not able to formulate e, she may submit query s 6= e to the database. Of course, the user still expects that the database system returns the answers of intent e for her submitted query. Intent e and its query s generally belong to the same query language. It is also possible that users, due to their lack of knowledge or time, use only a subset of the query language of e. For instance, she may submit only CQ or UCQ queries to the database. Our framework and results are orthogonal to these conditions. Similar to the set of possible intents, the set of queries over a database instance is infinite. Nevertheless, a user in practice submits only a finite number of queries in a finite time period. Hence, we assume that the set of all queries submitted by a user is finite. We index each query over database instance I by 1 ≤ j ≤ n.

4.1.3 Result Given a query s over database instance I, the query interface generally returns a set or bag of tuples as the answer of s. An obvious choice is to return s(I). Because the query interface knows that the input query may not precisely specify his intent, it also considers alternative answers to satisfy the information need behind the query as much as possible [18, 7, 13]. Thus, it may return sets of tuples other than s(I). D EFINITION 4.1. A result t over database instance I is a finite relation instance such that adom(t) ⊂ adom(I). Because we consider only relation instances with finite arities and adom(I) is finite, the set of all intents a database instance is finite. We index each intent over database instance I by 1 ≤ ℓ ≤ o. Given that the query interface returns result t for intent e, we need some metrics to measure how effectively t answers e. There are some standard metrics in database systems and information retrieval to measure the user satisfaction given

a returned set of tuples [27, 7]. Let a tuple be relevant to intent e over database instance I if it belongs to e(I). The precision of a result t for intent e over database I is the fraction of its tuples that are in e(I). The precision of a result is larger if it contains fewer non-relevant tuples. The recall of a result t for intent e over database I is the fraction of tuples in e(I) that are in t. A result to an intent has higher recall, if the result contain more relevant tuples. One may improve the recall by returning more tuples [27]. Because this strategy includes more tuples in the result, it may add more non-relevant tuples to the result and potentially sacrifice the precision. Ideally, the returned result should have both prefect precision and recall. F-measure is the harmonic mean of recall and precision and takes its maximum value for a pair of intent and returned result iff both precision and recall are maximum for the intent and result. Users often do not have enough time to go over all returned tuples and inspect only the top-k returned tuples [27]. There are some rank-based standard measures that prefers the answers where the relevant tuples appear in the higher positions. For example, precision at k, p@k, is the fraction of relevant tuples in the top-k returned tuples. In these settings, the result of a query is a list of tuples. For simplicity, we assume that each result to a query is a set of tuples. Our theorems and conclusions in Section 5 extends for the case where results are lists of tuples. In Section 5.8, we focus on ranked-based metrics and redefine a result as a list of tuples accordingly. Moreover, some tuples may only partially cover the user’s desired information. For example, a user may want to know both the names and grades of some students from a university database, but each tuple in the result contains only the name of the student. There are some standard effectiveness metrics, such as nDCG, that quantify the amount relevance of each returned tuple using multiple relevance levels, i.e., numbers [27]. Our results in this paper holds for such effectiveness metrics. The aforementioned metrics are defined for the cases where the intent has at least one relevant answer over the database instance I, i.e., e(I) 6=. If we have e(I) 6=, one may use fallout that penalizes the result that contain many non-relevant answers, to measure the effectiveness of returned results.

4.2 Strategies and Rewards The strategy of user, P , is a random mapping from the set of his intents to queries. That is, given that the user’s intent is ei , he may submit P query sj with probability Pij such that for a given ei j Pij = 1. The query interface strategy, Q, is a random mapping from the set of queries to the set of results. Formally, given the input query sj , the query interface returns result P tℓ with probability of Qjℓ . For a given query sj , we have ℓ Qjℓ = 1. Let r represent an effectiveness measure described in Section 4.1.3. We denote the degree of effectiveness of result tℓ for intent ei as r(ei , tℓ ). We have 0 ≤ r(ei , tℓ ) ≤ 1 for all 1 ≤ i ≤ m and 1 ≤ ℓ ≤ o. Because the goal of both the user and query interface is to satisfy the user’s information need, an interaction yields a reward r(ei , tℓ ) for both when the intent and returned result are ei and tℓ , respectively. The communication between the user and query interface

can be modeled as a signaling game with identical interests played between them. In this game, the set of strategies of a user is the set Lmn of row-stochastic matrices and the set of strategies of the query interface is the set Lno of n × o rowstochastic matrices. The payoff of user and query interface in this case w.r.t. effectiveness measure r are o n m X X X Pij Qjℓ r(ei , tℓ ). (1) πi ur (P, Q) = i=1

j=1

ℓ=1

The payoff function (1) is an expected payoff of the interaction between user and query interface when the user maps a (random) intent ei to a query sj with probability Pij and the database maps back the query sj to result tℓ with probability Qjℓ . The larger the value of this payoff function is, the more likely it is that the query interface returns the desired answers to more users’ queries. This payoff structure reflects the widely used performance metric for effective database querying.

5. EQUILIBRIUM ANALYSIS 5.1 Fixed User Strategy In some settings, the strategy of user may change in a much slower time scale than that of the database system. In these cases, it is reasonable to assume that user strategy is fixed. Hence, the game will reach a stable state where the database system adapts a strategy that maximizes the expected payoff. Let a strategy profile be a pair of user and query interface strategies. D EFINITION 5.1. Given strategy profile (P ,Q), Q is a best response to P w.r.t. effectiveness measure r if and only if we have ur (P, Q) ≥ ur (P, Q′ ) for all Q′ 6= Q. A database strategy Q is a strict best response to P if and only if the inequality in Definition 5.1 becomes strict. E XAMPLE 5.2. Consider the example illustrated in Figure 1 that results in the strategy shown in Figure 2. The columns t1 , t2 , and t3 in query interface strategy in Figure 2 (b) are the outputs of applying the intents, e1 , e2 , and e3 in Figure 1 (b) to the Student table in Figure 1(c), respectively. The query interface strategy is a best response for the user strategy w.r.t. precision. It is also a strict best response to the user strategy w.r.t. precision. D EFINITION 5.3. Given strategy profile (P, Q), intent ei , and query sj the payoff of intent ei using query sj is o P Qj,ℓ r(ei , tℓ ). ur (ei , sj ) = ℓ=1

D EFINITION 5.4. The pool of intents for query sj in user strategy P is the set of intents ei such that Pi,j > 0. We denote the pool of intents of sj as P L(sj ). Our definition of pool of intent extends the notion of pool of state in signaling games [9, 11]. Each result tℓ such that Qj,ℓ > 0 may be returned in response to query sj . We call the set of these results the reply to query sj . D EFINITION 5.5. A best reply to queryPsj w.r.t. effectiveness measure r is a reply that maximizes ei ∈P L(sj ) πi Pi,j ur (ei , sj ).

Query# s1 s2

Query ans(x) ← Student(x, ‘Sarah’, y, z) ans(x) ← Student(x, y, ‘Smith’, z) 9(a) Queries

Intent# e1 e2 e3

Intent ans(x) ← Student(x, ‘Sarah’, y, ‘C.S.’) ans(x) ← Student(x, ‘Sarah’, ‘Smith’, y) ans(x) ← Student(x, y, ‘Smith’, ‘E.E.’)

9(b) Intents

ID 1 2 3 5

First Name Sarah Mike Sarah Sarah

Last Name Connery Smith Smith Smith

Major M.E. C.S. C.S. E.E.

D EFINITION 5.8. Strategy profile (P, Q) is a Nash equilibrium w.r.t. satisfaction function r if and only if we have ur (P, Q) ≥ ur (P ′ , Q) for all P ′ 6= P and ur (P, Q) ≥ ur (P ′ , Q) for all Q′ 6= Q. E XAMPLE 5.9. Consider again the intents, queries, and database instance and strategies in Example 5.2. The strategy profile is a Nash Equilibrium w.r.t precision. User and query interfaces cannot unilaterally change their strategies and receive a better payoff. Query# s1 s2

Query ans(x) ← Student(x, ‘Sarah’, y, z) ans(x) ← Student(x, y, ‘Smith’, z) 9(a) Queries

9(c) Database Instance: Student

Figure 1: Queries, intents, and database instance for Example 5.2 e1 e2 e3

s1 1 1 0

s2 0 0 1

9(a) User Strategy

s1 s2

t1 1 0

t2 0 0

t3 0 1

9(b) Database Strategy

Figure 2: Strategy profile for Example 5.2 The following theorem characterizes the best response to a user strategy. T HEOREM 5.6. Given strategy profile (P, Q), Q is a best response to P w.r.t. effectiveness measure r if and only if Q maps every query to its best reply. P ROOF. Since each query is assigned its best reply in Q, no improvement in the expected payoff is possible. We show the following theorem similar to Theorem 5.6. T HEOREM 5.7. Given strategy profile (P, Q), Q is a strict best response to P w.r.t. effectiveness measure r if and only if every query has one and only one best reply and Q maps each query to its best reply. Given intent e over database instance I, F-measure takes its maximum only for result e(I). On the other hand, other effectiveness measures, such as precision and recall, take their maximum for other results in addition to e(I). For example, given intent e, the precision of every non-empty result t ⊂ e(I) is equal to the precision of e(I) for e. Hence, there are more than one best reply for an intent w.r.t. precision or recall. Thus, according to Theorem 5.7, there is not any strict best response w.r.t. precision or recall.

5.2 Nash Equilibrium In this section and Section 5.3, we analyze the equilibria of the game where both user and query interface may modify their strategies. A Nash equilibrium for a game is a strategy profile where the query interface and user will not do better by unilaterally deviating from their strategies.

Intent# e1 e2 e3

Intent ans(x) ← Student(x, ‘Sarah’, ‘Smith’, ‘C.S.’) ans(x) ← Student(x, ‘Sarah’, z, ‘C.S.’) ans(x) ← Student(x, y, ‘Smith’, ‘E.E.’) 9(b) Intents

Figure 3: Intents, queries, and database instance for Example 5.10 E XAMPLE 5.10. Consider the intents and queries illustrated in Figure 3, the previous database in Figure 1 (c), and the strategy profile shown in Figure 4. The columns in the query interface strategy in Figure 4 (b), t1 , t2 , and t3 , are the tuples returned from applying the intents, e1 , e2 , and e3 in Figure 3 (b) to the Student table in Figure 3(c), respectively. Let 0 ≤ δ ≤ 1 and 0 ≤ ǫ ≤ 1. The strategy profile in Figure 4 is a Nash equilibrium w.r.t precision. e1 e2 e3

s1 1−δ 1 0

s2 δ 0 1

9(a) User strategy

s1 s2

t1 1−ǫ 0

t2 ǫ 0

t3 0 1

9(b) Query interface strategy

Figure 4: Nash equilibrium strategy profile If the interactions between user and query interface reach a Nash equilibrium, the user will not have any incentive to change his strategy. As a result the strategy of query interface and the expected payoff of the game will also remain unchanged. Hence, in a Nash equilibrium the strategies of user and database management system are stabilized. Also, the payoff at a Nash equilibrium reflects a potential eventual payoff for user in his interaction. Query sj is a best query for intent ei if and only if sj = arg maxsk ur (ei , sk ). The following theorem characterizes the Nash equilibrium of the game. T HEOREM 5.11. Strategy profile (P, Q) is a Nash equilibrium w.r.t. effectiveness measure r if and only if • for every query s, s is a best query for every intent e ∈ P L(s).

• Q is a best response. P ROOF. Assume that (P, Q) is a Nash equilibrium. Also, assume sj is not a best query for ei ∈ P L(sj ). Let sj ′ be a best query for ei . We first consider the case where ′ = Pk,ℓ for ur (ei , sj ′ ) > 0. We build strategy P ′ where Pk,ℓ ′ ′ all entries (k, ℓ) 6= (i, j) and (k, ℓ) 6= (i, j ), Pi,j = 0, and ′ ′ ′ Pi,j ′ = Pi,j . We have P 6= P and ur (P, Q) < ur (P , Q). Hence, (P, Q) is not a Nash equilibrium. Thus, we have Pi,j = 0 and the first condition of the theorem holds. Now, consider the case where ur (ei , sj ′ ) = 0. In this case, we will also have ur (ei , sj ) = 0, which makes sj a best query for ei . We prove the necessity of the second condition of the theorem similarly. This concludes the proof for the necessity part of the theorem. Now, assume that both conditions of the theorem hold for strategies P and Q. We can prove that it is not possible to have strategies P ′′ and Q′′ such that ur (P, Q) < ur (P ′′ , Q) or ur (P, Q) < ur (P, Q′′ ) using a similar method. Let N ashr denote the set of Nash equilibrium strategy profiles w.r.t. effectiveness measure r. The following theorem immediately results from Theorems 5.11. T HEOREM 5.12. N ashF −measure = N ashprecision ∩ N ashrecall . P ROOF. Given user strategy P , let DBF −measure (P ) denote all database strategies Q such that (P, Q) is a Nash equilibrium. We have F − measure(ei , ek ) = 1 iff precision(ei , ek ) = 1 and recall(ei , ek ) = 1. Hence, DBF −measure (P ) = DBprecision (P ) ∩DBrecall (P ). Also, let U serF −measure (Q) denote all user strategies such that (P, Q) is a Nash equilibrium. We have U serF −measure (Q) = U serprecision (Q) ∩U serrecall (Q). Hence, the result of theorem follows. Theorem 5.19 shows that generally the database system has more alternatives to reach a Nash equilibrium using a precision or recall than F-measure.

5.3 Strict Nash Equilibrium A strict Nash equilibrium is a strategy profile in which the query interface and user will do worse by changing their equilibrium strategy. D EFINITION 5.13. Strategy profile (P ,Q) is a strict Nash equilibrium w.r.t. effectiveness measure r if and only if we have ur (P, Q) > ur (P, Q′ ) for all query interface strategies Q′ 6= Q and ur (P, Q) > ur (P ′ , Q) for all user strategies P ′ 6= P . E XAMPLE 5.14. Consider the intents, queries, database instance, and strategy profile in Example 5.2. The strategy profile is a strict Nash equilibrium w.r.t precision. Colleges are supersets of the majors within the colleges. E XAMPLE 5.15. The strategy profile in Example 5.10 is not a strict Nash w.r.t precision. One may independently modify the values of δ and ǫ without changing the payoff of the players.

The following theorem provides the properties of strict Nash equilibria for the game. We prove it similar to Theorem 5.11. T HEOREM 5.16. The strategy profile (P ,Q) is a strict Nash equilibrium w.r.t. effectiveness measure r if and only if • every intent e has a unique best query and the user strategy maps e to its best query, i.e., e ∈ P L(si ). • Q is strict best response to P . Next, we investigate the characteristics of strategies in a strict Nash equilibria profile. User strategy P is pure if and only if we have Pi,j ∈ 0, 1 for all i, j. We define pure query interface strategy similarly. A user strategy is onto if and only if there is not any query sj such that Pi,j = 0. A query interface strategy is one-to-one if and only if it does not map two queries to the same result. In other words, there is not any result tell such that Qjℓ > 0 and Qj ′ ℓ > 0 where j 6= j ′ . T HEOREM 5.17. If (P ,Q) is a strict Nash equilibrium w.r.t. satisfaction function r, we have • P is pure and onto. • Q is pure and one-to-one. P ROOF. Let us assume that there is intent ei and query sj such that 0 < Pi,j < 1. Since P is row stochastic, there P is a query sj ′ where 0 < Pi,j ′ < 1. Let ur (Pi,j , Q) = oℓ=1 Qj,ℓ r(ei , tℓ ). If ur (Pi,j , Q) = ur (Pi,j ′ , Q), we can ′ ′ create a new user strategy P ′ where Pi,j = 1 and Pi,j ′ = 0 ′ and the values of other entries in P is the same as P . Because the payoff of (P ,Q) and (P ′ ,Q) are equal, (P ,Q) is not a strict Nash equilibrium. Now, without loss of generality assume that ur (Pi,j , Q) > ur (Pi,j ′ , Q). We construct a new user strategy P ′′ whose values for all entries except (i, j) and (i, j ′ ) are equal to P and Pi,j = 1, Pi,j ′ = 0. Because ur (P, Q) < ur (P ′′ , Q), (P ,Q) is not a strict Nash equilibrium. Hence, P must be a pure strategy. We prove that Q is a pure strategy similarly. If P is not onto, there is query sj that is not mapped to any intent in P . Hence, one may change the value in row j of Q without changing the payoff of (P, Q). Assume that Q is not one-to-one. Hence, there are queries si and sj and result tℓ such that Qi,ℓ = Qj,ℓ = 1. Because (P, Q) is a strict Nash, P is pure and we have either Pi,ℓ = 1 or Pj,ℓ = 1. Assume that Pi,ℓ = 1. We can construct strategy P ′ that have the same values as P for all entries ′ ′ except for (i, ℓ) and (j, ℓ) and Pi,ℓ = 0, Pj,ℓ = 1. Since ′ the payoffs of (P, Q) and (P , Q) are equal, (P, Q) is not a strict Nash equilibrium. Theorem 5.17 extends the Theorem 1 in [11] for our setting. In practice, the user generally knows and uses fewer queries than his intents. In other words, we have m > n. Because the query interface strategy in a strict Nash equilibrium is one-to-one, the query interface strategy does not map some the results to any query. Hence, the database will never return some results in a strict Nash equilibrium no matter what

query is submitted. Interestingly, as Example 5.2 suggests some of these results may be the results that perfectly satisfy some user’s intents. That is, given intent e over database instance I, the query interface may never return e(I) in a strict Nash equilibrium. In this situation, the user strategy in a strict Nash equilibrium groups the intents with non-empty intersection, to address the shortage of queries. The following theorem formalizes this phenomena. T HEOREM 5.18. Let (P, Q) be a strict Nash equilibrium w.r.t. precision and there is a uniform prior over the intents in P . For every query sj , there are intents ei , ek ∈ P L(sj ) such that ei ∩ ek 6= ∅. P ROOF. Given strict Nash equilibrium (P, Q), assume that there is a query sj where for every ei , ek ∈ P L(sj ) we have ei ∩ek = ∅. Then, each member of P L(sj ) is a best response to the pool of sj . According to Theorem 5.16, (P, Q) is not a strict Nash equilibrium. Similar results hold for recall and F-measure. As one expects, Theorems 5.16 and 5.17 show that the set of strict Nash equilibrium strategy profiles is a subset of the set of Nash equilibrium strategy profiles for the same satisfaction function. Let SN ashr denote the set of strict Nash equilibrium strategy profiles w.r.t. satisfaction function r. The next theorem follows from Theorem 5.16 and that the results of Theorem 5.12 also holds for strict Nash equilibria of the game. T HEOREM 5.19. SN ashF −measure = SN ashprecision ∩ SN ashrecall .

6.

ADAPTATION MECHANISMS FOR FIXED USER STRATEGY

We consider the case that the user is not adapting to the strategy of the database. In many relevant applications, the user’s learning is happening in a much slower time-scale compared to the learning of the database. So, one can assume that the user’s strategy is fixed compared to the timescale of the database adaptation. When dealing with the game introduced in the previous sections, many questions arise: i. How can a database learn or adapt to a user’s strategy? ii. Mathematically, is a given learning rule effective? iii. What would be the limiting behavior of a given learning rule? Here, we address the first and the second questions above. Dealing with the third question is far beyond the page limits of this paper. For simplicity in notation, we assume that the database strategy is an n × m ˜ stochastic matrix for the rest of the paper, where m ˜ can be much smaller than m. The reason for this assumption is that for practical purposes the intent set of the user can be arbitrarily large, but we would like to efficiently maintain and update the query interface strategy. Hence, the number of entries the query interface strategy should not be extremely large. Note that in this case,

we have: u(P, Q) =

m X i=1

πi

n X

Pij

j=1

m ˜ X

Qjℓ riℓ ,

ℓ=1

where r : [m] × [m] ˜ → R+ is the effectiveness measure between the intent i and the result, i.e., decoded intent ℓ.

6.1 Reinforcement Learning for an Arbitrary Similarity Measure As in [16], we consider Roth-Erev reinforcement learning mechanism for adaptation of the database adaption. For the case that both the database and the user adapt their strategies, one can use the results in [16]. Let us discuss the database adaptation rule. The learning/adaptation rule happens over discrete time t = 0, 1, 2, 3, . . . instances where t denoted the tth interaction of the user and the database. We refer to t simply as the iteration of the learning rule. With this, the reinforcement learning mechanism for the database adaptation is as follows: a. Let R(0) > 0 be an n × m initial reward matrix whose entries are strictly positive. b. Let Q(0) be the initial database strategy with Qjℓ (0) = Rjℓ (0) Pm > 0 for all j ∈ [n] and ℓ ∈ [m]. ˜ R (0) ℓ=1

jℓ

c. For iterations t = 1, 2, . . ., do i. If the user’s query at time t is s(t), return a result E(t) ∈ E: P (E(t) = i′ | s(t)) = Qs(t)i′ (t). ii. User gives a reward rii′ given that i is the intent of the user at time t. Note that the reward depends both on the intent i at time t and the result i′ . Then, set  Rjℓ (t) + riℓ if j = s(t) and ℓ = i′ Rjℓ (t + 1) = . Rjℓ (t) otherwise (2) iii. Update the database strategy by Rji (t + 1) , Qji (t + 1) = Pm ˜ ℓ=1 Rjℓ (t + 1)

(3)

for all j ∈ [n] and i ∈ [m]. ˜

In the above scheme R(t) is simply the reward matrix at time t. Few comments are in order regarding the above adaptation rule: - One can use available ranking functions, e.g. [7], for the initial reward condition R(0) which possibly leads to an intuitive initial point for the learning rule. One may normalize and convert the scores returned by these functions to probability values. - In step c.ii., if the database have the knowledge of the user’s intent after the interactions (e.g. through a click), the database set Rji + 1 for the known intent ei . The mathematical analysis of the both cases will be similar.

- In the initial step, as the query interface uses a ranking function to compute the probabilities, it may not materialize the mapping between the results and queries. As the game progresses, it should maintain the reward values for the seen queries and their returned results. This technique is practical for the cases where the total number of submitted queries is of a moderate size.

6.2 Analysis of the Learning Rule In this section, we provide an analysis of the reinforcement mechanism provided above and will show that, statistically speaking, the adaptation rule leads to improvement of the efficiency of the interaction. Note that since the user gives feedback only on one tuple in the result, one can without the loss of generality assume that. A SSUMPTION 6.1. We assume that the cardinality of the list k is 1. For the analysis of the reinforcement learning mechanism in Section 6 and for simplification, denote u(t) := ur (P, Q(t)) = ur (P, Q(t)),

(4)

for an effectiveness measure r as ur is defined in (1). We recall that a random process {X(t)} is a submartingale [12] if it is absolutely integrable (i.e. E(|X(t)|) < ∞ for all t) and E(X(t + 1) | Ft ) ≥ X(t), where Ft is the history or σ-algebra generated by X1 , . . . , Xt . In other words, a process {X(t)} is a sub-martingale if the expected value of X(t + 1) given X(t), X(t − 1), . . . , X(0), is not strictly less than the value of Xt . Note that submartingales are nothing but the stochastic counterparts of monotonically increasing sequences. As in the case of bounded (from above) monotonically increasing sequences, submartingales poses the same property, i.e. any submartingale {X(t)} with E(|X(t)|) < B for some B ∈ R+ and all t ≥ 0 is convergent almost surely. We refer the interested readers to [12] for further information on this result (martingale convergence theorem). The main result in this section is that the sequence of the utilities {u(t)} (which is indeed a stochastic process as {Q(t)} is a stochastic process) defined by (4) is a submartignale when the reinforcement learning rule in Section 6 is utilized. As a result the proposed reinforcement learning rule stochastically improves the efficiency of communication between the database and the user. More importantly, this holds for an arbitrary reward/effectiveness measure r. This is rather a very strong result as the algorithm is robust to the choice of the reward mechanism. To show this, we discuss an intermediate result. For simplicity of notation, we fix the time t and we use superscript + to denote variables at time (t + 1) and drop the dependencies at time t for variables depending on time t. Throughout the rest of our discussions, we let {Ft } be the natural filtration for the process {Q(t)}, i.e. F is the σ-algebra generated by Q(0), . . . , Q(t).

L EMMA 6.2. For any ℓ ∈ [m] and j ∈ [n], we have E(Q+ jℓ | Ft ) − Qjℓ = Qjℓ ·

m X

m ˜ X riℓ riℓ′ Qjℓ′ ¯ ¯ j + ril − R R j + riℓ′ ℓ′ =1

πi Pij

i=1

¯j = where R

Pm ˜

ℓ′ =1

!

,

Rjℓ′ .

P ROOF. Fix ℓ ∈ [m] and j ∈ [n]. Let A be the event that at the t’th iteration, we reinforce a pair (j, ℓ′ ) for some ℓ′ ∈ [m]. Then on the complement Ac of A, Q+ jℓ (ω) = Qjℓ (ω). Let Ai,ℓ′ ⊆ A be the subset of A such that the intent of the user is i and the pair (j, ℓ′ ) is reinforced. Note that the collection of sets {Ai,ℓ′ } for i, ℓ′ ∈ [m], are pairwise mutually exclusive and their union constitute the set A. We note that   Q+ jℓ

m m ˜ X X  Rjℓ + ril  Rjℓ   = 1 + 1 A A ′ i,ℓ i,ℓ R ¯ j + riℓ ¯ j + riℓ′ R ′ i=1 ℓ =1 ℓ′ 6=ℓ

+ Qjℓ 1Ac .

Therefore, we have E(Q+ jℓ | Ft ) = +

m X i=1

πi Pij

m X

Rjℓ + riℓ πi Pij Qjℓ ¯ Rj + riℓ i=1

Rjℓ Qjℓ′ ¯ + (1 − p)Qjℓ , R j + riℓ′ ℓ6=ℓ′ X

where p = P(A | F ). Note that Qjℓ = E(Q+ jℓ | Ft ) − Qjℓ =

m X

Rji ¯j R

and hence,

¯ j − Rjℓ riℓ R πi Pij Qjℓ ¯ ¯ Rj (Rj + riℓ ) i=1 −

m X i=1

πi Pij

Rjℓ riℓ′ . Qjℓ′ ¯ ¯ Rj (Rj + riℓ′ ) ℓ6=ℓ′ X

R

Replacing R¯jlj with Qjℓ and rearranging the terms in the above expression, we get the result. To show the main result, we will utilize the following result in martingale theory. T HEOREM 6.3. [31] A random process {Xt } converges almost surely if Xt is bounded, i.e., E(|Xt |) < B for some B ∈ R+ and all t ≥ 0 and E(Xt+1 |Ft ) ≥ Xt − βt

(5)

where βt ≥ 0 is a summable sequence almost surely, i.e., P β < ∞ with probability 1. t t

Note that this result is a weaker form of the Robins-Siegmund martingale convergence theorem in [31] but it will serve for the purpose of our discussion. Using Lemma 6.2 and the above result, we show that up to a summable disturbance, the proposed learning mechanism is stochastically improving.

T HEOREM 6.4. Let {u(t)} be the sequence given by (4). Then, E(u(t + 1 | Ft ) ≥ E(u(t) | Ft ) − βt , for some P non-negative random process {βt } that is summable (i.e. ∞ t=0 β < ∞ almost surely). As a result {u(t)} converges almost surely.

m X m ˜ X

πi Pij Qjℓ riℓ(t) ,

j ¯ j := and also define R ℓ′ =1 Rjℓ′ . Note that u is the efficiency of the jth signal/query. Using the linearity of conditional expectation and Lemma 6.2, we have: m ˜ m X n   X X | F ) − Q πi Pij riℓ′ E(Q+ E(u+ | Ft ) − u = t jℓ jℓ i=1 j=1

=

m X n X m ˜ X

ℓ=1

m X

πi Pij Qjℓ riℓ

πi′ Pi′ j

i′ =1

i=1 j=1 ℓ=1

m ˜ X

ri′ ℓ′ − Qjℓ′ ¯ Rj + ri′ ℓ′ ℓ′ =1



!!

ri′ ℓ ¯ Rj + ri′ ℓ

. (6)

Pm

Pm

riℓ ¯ j +riℓ . i=1 πi Pij R

Now, let yjℓ = i=1 πi Pij riℓ and zjℓ = Then, we get from the above expression that E(u+ | Ft ) − u = m ˜ n X X j=1

Qjℓ yiℓ zjℓ −

ℓ=1

m ˜ X

Qjℓ yjℓ

m ˜ X

Qjℓ′ zjℓ′

ℓ′ =1

ℓ=1

!

.(7)

Now, we express the above expression as E(u+ | Ft ) − u = Vt + V˜t

(8)

where

and

 m ˜ n X 1 X 2 Qjℓ yjℓ − Vt = ¯j R j=1 ℓ=1

V˜t =

m ˜ n X X j=1

Qjℓ yjℓ

P

Qjℓ yjℓ

l=1

Qjℓ′ z˜jℓ′ −

ℓ′ =1

ℓ=1

Further, z˜jℓ =

m ˜ X

m ˜ X

m X ℓ=1

!2 

,

Qjℓ yjℓ z˜jℓ

!

. (9)

r2

i=1

Qjℓ (yjℓ )2 ≥

ℓ=1

πi Pij R¯ j (R¯ jiℓ+riℓ ) .

We claim that Vt ≥ 0 for each t and {V˜t } is a summable sequence almost surely. Then, from (8) and Theorem 6.3, we get that {ut } converges almost surely and it completes the proof. Next, we validate our claims. We first show that Vt ≥ 0, ∀t. Note that Q is a rowPm ˜ stochastic matrix and hence, ℓ=1 Qjℓ = 1. Therefore, by

m ˜ X

(Qjℓ yjℓ )2 .

ℓ=1

Hence, V ≥ 0. We next claim that {V˜t } is a summable sequence with probability one. It can be observed from (9) that m ˜ X m ˜ 2n ¯2 . R j=1

i=1 ℓ=1

Pm

m ˜ X

Vt ≤

P ROOF. Let u+ := u(t + 1), u := u(t), uj := uj (P (t), Q(t)) =

the Jensen’s inequality [12], we have:

(10)

j

¯ −2 for each j ∈ [n], ℓ ∈ [m] and since yjℓ ≤ 1, z˜jℓ ≤ R j Q is a row-stochastic matrix. To prove the claim, it suffices to show that for each j ∈ [m], the sequence { R21(t) } j

is summable. Note that for each j ∈ [m] and for each t, we ¯ j (t+1) = R ¯ j (t)+ǫt where ǫt ≥ ǫ > 0 with probabilhave R ity pt ≥ p > 0. Therefore, using the Borel-Cantelli Lemma for adapted processes [12] we have { R21(t) } is summable j which concludes the proof. The above result implies that the effectiveness of query interface, stochastically speaking, increases as time progresses when the learning rule in Section 6 is utilized. Not only that, but this property does not depend on the choice of the effectiveness function (i.e. riℓ in this case). This is indeed a desirable property for any adapting/learning scheme for database adaptation.

7. CONCLUSION & FUTURE WORK We modeled the interaction between the user and the database query interface as a repeated signaling game, where the players starts with different mapping between signals, i.e. queries, and objects, i.e. desired entities, and like to reach an effective decoding mechanism. We proposed an adaptation mechanism for the query interface to learn the signaling strategy of the user and prove that this mechanism increases the expected payoff for both user and the query interface in average and converges almost surely. We also proposed k-list learning dynamics and show that this algorithm also improves the efficiency of the database system. We plan to explore the possible equilibria of this game where the user modifies her signaling strategy with a different rate from the query interface. Further, it may be challenging to efficiently maintain and updated the signaling strategy of the query interface for very large databases. We plan to investigate and address these challenges. Also, the proposed k-list learning rule does not utilize effectiveness measure between the user’s intent and the decoded intents. One of the possible future research directions is to generalize this algorithm and its analysis for arbitrary effectiveness measure.

8. REFERENCES [1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases: The Logical Level. Addison-Wesley, 1994. [2] A. Abouzied, D. Angluin, C. H. Papadimitriou, J. M. Hellerstein, and A. Silberschatz. Learning and verifying quantified boolean queries by example. In PODS, pages 49–60, 2013.

[3] I. Abraham, D. Dolev, R. Gonen, and J. Halpern. Distributed computing meets game theory: robust mechanisms for rational secret sharing and multiparty computation. In PODC, 2006. [4] P. Avesani and M. Cova. Shared lexicon for distributed annotations on the Web. In WWW, pages 207–214, 2005. [5] T. Beckers et al. Report on INEX 2009. SIGIR Forum, 44(1):38–57, 2010. [6] A. Bonifati, R. Ciucanu, and S. Staworko. Learning join queries from user examples. TODS, 2015. [7] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum. Probabilistic information retrieval approach for ranking of database query results. ACM Trans. Database Syst., 31(3):1134–1168, 2006. [8] Y. Chen, W. Wang, Z. Liu, and X. Lin. Keyword search on structured and semi-structured data. In SIGMOD, 2009. [9] I. Cho and D. Kreps. Signaling games and stable equilibria. Quarterly Journal of Economics, 102:179–221, 1987. [10] K. Dimitriadou, O. Papaemmanouil, and Y. Diao. Explore-by-example: An automatic query steering framework for interactive data exploration. In SIGMOD, 2014. [11] M. C. Donaldson, M. Lachmannb, and C. T. Bergstroma. The evolution of functionally referential meaning in a structured world. Journal of Mathematical Biology, 246:225–233, 2007. [12] R. Durrett. Probability: theory and examples. Cambridge university press, 2010. [13] R. Fagin, B. Kimelfeld, Y. Li, S. Raghavan, and S. Vaithyanathan. Understanding Queries in a Search Database System. In PODS, 2010. [14] G. H. L. Fletcher, J. V. D. Bussche, D. V. Gucht, and S. Vansummeren. Towards a theory of search queries. In ICDT, 2009. [15] N. Fuhr and T. Rolleke. A probabilistic relational algebra for the integration of information retrieval and database systems. TOIS, 15, 1997. [16] Y. Hu, B. Skyrms, and P. Tarrès. Reinforcement learning in signaling game. arXiv preprint arXiv:1103.5818, 2011. [17] S. Idreos, O. Papaemmanouil, and S. Chaudhuri. Overview of data exploration techniques. In SIGMOD, 2015. [18] H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making database systems usable. In SIGMOD, 2007. [19] N. Khoussainova, Y. Kwon, M. Balazinska, and D. Suciu. Snipsuggest: Context-aware autocompletion for sql. PVLDB, 4(1), 2010. [20] B. Kimelfeld and Y. Sagiv. Finding and Approximating Top-k Answers in Keyword Proximity Search. In PODS, 2005. [21] D. Lewis. Convention. Cambridge: Harvard University Press, 1969. [22] H. Li, C.-Y. Chan, and D. Maier. Query from

[23]

[24] [25] [26]

[27] [28] [29]

[30] [31]

[32] [33] [34]

[35] [36] [37]

examples: An iterative, data-driven approach to query construction. PVLDB, 8(13), 2015. J. Luo, S. Zhang, and H. Yang. Win-win search: Dual-agent stochastic game in session search. In SIGIR, 2014. J. Luo, S. Zhang, and H. Yang. Towards a game-theoretic framework for information retrieval. In SIGIR, 2015. Q. Ma, S. Muthukrishnan, B. Thompson, and G. Cormode. Modeling collaboration in academia: A game theoretic approach. In BigScholar, 2014. D. Maier, D. Rozenshtein, S. Salveter, J. Stein, and D. S. Warren. Toward logical data independence: A relational query language without relations. In SIGMOD, 1982. C. Manning, P. Raghavan, and H. Schutze. An Introduction to Information Retrieval. Cambridge University Press, 2008. A. Nandi and H. V. Jagadish. Guided interaction: Rethinking the query-result paradigm. PVLDB, 102, 2011. A. D. S. nd A. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, 2010. M. A. Nowak and D. C. Krakauer. The evolution of language. PNAS, 96(14), 1999. H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. In Herbert Robbins Selected Papers, pages 111–135. Springer, 1985. Y. Shoham. Computer science and game theory. Commun. ACM, 51(8), 2008. A. Termehchy and B. Touri. A signaling game approach to database querying and interaction. In ICTIR, 2015. B. Touri and C. Langbort. Language evolution in a noisy environment. In American Control Conference (ACC), 2013, pages 1938–1943. IEEE, 2013. Q. Tran, C. Chan, and S. Parthasarathy. Query by output. In SIGMOD, 2009. P. Trapa and M. Nowak. Nash equilibria for an evolutionary language game. Journal of Mathematical Biology, 41:172–188, 2000. M. Vardi. The universal relation data model for logical independence. IEEE Software, 5:80–85, 1988.