lief Network to represent features of the domain that are needed to identify users' .... buy. 17. 773335291 spillage players/paladin/room/western gate bribe. 28.
In Anthony Jameson, Cécile Paris, and Carlo Tasso (Eds.), User Modeling: Proceedings of the Sixth International Conference, UM97. Vienna, New York: Springer Wien New York. © CISM, 1997. Available on−line from http://um.org.
Towards a Bayesian Model for Keyhole Plan Recognition in Large Domains David W. Albrecht, Ingrid Zukerman, Ann E. Nicholson, and Ariel Bud ? Department of Computer Science, Monash University, Australia
Abstract. We present an approach to keyhole plan recognition which uses a Dynamic Belief Network to represent features of the domain that are needed to identify users’ plans and goals. The structure of this network was determined from analysis of the domain. The conditional probability distributions are learned during a training phase, which dynamically builds these probabilities from observations of user behaviour. This approach allows the use of incomplete, sparse and noisy data during both training and testing. We present experimental results of the application of our system to a Multi-User Dungeon adventure game with thousands of possible actions and positions. These results show a high degree of predictive accuracy and indicate that this approach will work in other domains with similar features.
1 Introduction To date, research in plan recognition has focused on three main areas: (1) inferring plans during cooperative interactions, (2) understanding stories, and (3) recognising the plans of an agent who is unaware that his/her plans are being inferred (Raskutti, 1993). In the first two areas, the plan recognition process is intended, since a user/writer is attempting to convey his/her plan to the system. In addition, during cooperative interactions, a plan recognition system can interrogate the user when confronted with ambiguous or incomplete information (e.g., Allen and Perrault, 1980, Litman and Allen, 1987, Raskutti and Zukerman, 1991). The third area is called keyhole plan recognition because the information available to the plan recogniser is gleaned from noninteractive and often incomplete observations of a user (as though one was looking into a room through a keyhole). In the past, the use of hand-crafted plan libraries in systems that perform keyhole plan recognition imposed heavy restrictions on the size of their application domain, and hence on their usefulness. However, recently several researchers have applied machine learning techniques to the acquisition of plan libraries in an effort to overcome this problem (Lesh and Etzioni, 1995, Forbes et al., 1995)(Section 2). The mechanism described in this paper is part of this trend. Our approach to keyhole plan recognition uses a Dynamic Belief Network (DBN) to represent features of the domain needed to identify users’ plans and goals. Our current domain is the “Shattered Worlds” Multi-User Dungeon (MUD), an adventure game which resembles the real world in its complexity and size (Section 3). The MUD is a text-based virtual reality game where players compete for limited ? This research was supported in part by grant A49600323 from the Australian Research Council. The
authors are indebted to Michael McGaughey for writing the data collection programs for the MUD and for his assistance during the initial stages of this project.
366
D. W. Albrecht et al.
resources in an attempt to achieve various goals. The MUD has over 4,700 locations, over 7,200 actions, and 20 different quests (goals). The objective of the plan recognition mechanism is to determine, as early as possible, which quest a player is attempting, and to predict which action a player will perform in the next move and which location a player will go to next. To achieve this, the system must first learn which actions and positions or sequences of actions and positions tend to lead to a particular quest. This information is obtained from previous instances of completed quests during a training phase and modelled by means of a DBN (Section 4). During the testing phase, the DBN is used to predict a player’s quest, next action and next location. To this effect, every time a player performs an action, the system updates the probability that the player is trying to achieve each of the quests, perform each of the actions and move to each of the locations. The empirical results obtained by our system using this method are described in Section 5. Section 6 discusses ideas for future work and presents concluding remarks.
2 Related Work In recent times there has been a shift from systems that rely heavily on hand coded domain knowledge for plan recognition towards systems that apply machine learning techniques to automatically acquire domain knowledge. This has allowed a shift in domain size, whereby later systems deal with hundreds of actions in realistic domains. The systems described by Ca˜namero et al. (1992) and Wærn and Stenborg (1995) rely on domain knowledge. Ca˜namero et al. use an abstraction/specialisation plan hierarchy to perform plan recognition from noisy input representing sequences of observations of an evolving situation in traffic monitoring. Wærn and Stenborg use a hierarchy of actions in conjunction with “compiled” plans in order to anticipate a user’s intentions in domains where users exhibit reactive rather than plan-based behaviour, e.g., news reading. They perform simple probabilistic calculations to match a user’s actions in a particular time window to those in the domain plans. The system described by Bauer (1996) uses a plan hierarchy to represent the actions in the domain, but it applies decision trees (Quinlan, 1983) to obtain probabilities of different domain plans in the context of a user’s actions. It then uses the Dempster-Shafer theory of evidential reasoning to assess hypotheses regarding a user’s plans in context. Carberry (1990) also applies the Dempster-Shafer theory, using threshold plausibility and different levels of belief to distinguish among competing hypotheses. The plan recognition mechanism described in Lesh and Etzioni (1995) works on a graph which represents the relations between the actions and possible goals of the domain. The system iteratively applies pruning rules which remove from the graph goals that are not in any consistent plan. In later work, they automatically construct a virtual plan library using primitive actions and the predicates of goals (Lesh and Etzioni, 1996). Two important differences between our system and Lesh and Etzioni’s are: (1) they assume that any action performed by a user pertains to one of the goals in their virtual library, while our mechanism admits extraneous actions; and (2) at present, the user’s goals in our system (MUD quests) are well specified, while Lesh and Etzioni admit arbitrary goals. In the future, we intend to extend our mechanism to domains with such goals, e.g., the WWW and Unix. Charniak and Goldman (1993) use Bayesian networks1 for plan recognition in the framework of story understanding. They dynamically generate a Bayesian network from a sequence of 1
See Section 4.1 for more details about Bayesian networks.
Towards a Bayesian Model for Keyhole Plan Recognition in Large Domains
367
observations by applying rules which use plan knowledge to instantiate the network. The incorporation of prior probabilities into this network supports the selection of plausible explanations of observed actions. Pynadath and Wellman (1995) and Forbes et al. (1995) use Bayesian networks for plan recognition in traffic monitoring. Pynadath and Wellman use a Bayesian network composed of loosely connected sub-networks, where each sub-network captures an intermediate structure based on one of the following factors: the context in which a plan was generated, the mental state and planning process of the agent, and the consequences of the agent’s actions in the world. They apply the mechanism described by Huber et al. (1994) to map planning actions to a Bayesian network. Forbes et al. use Dynamic Bayesian Networks, emphasising issues that pertain to sensor noise or failure, and to uncertainty about the behaviour of other vehicles and about the effects of drivers’ actions. Finally, Russell et al. (1995) use a gradient-descent algorithm to learn the conditional probability tables for Bayesian networks with hidden variables, i.e., variables whose values are not observable. 2 The mechanism described in this paper resembles most closely the system described by Forbes et al. (1995), but there are several important differences: (1) we infer a user’s longer term goals, i.e., quests, in addition to the locations and actions inferred by Forbes et al.; (2) our data was collected prior to the undertaking of this project, hence we have had no choice in the view of the world that we are modelling, rather than being allowed to select the observations we wish to make; (3) we observe the world only from the perspective of a single user (without information about the effect of other agents’ actions on the world); and (4) we have no information regarding the quality of our observations, while they have information about sensor uncertainty and hence are able to model it.
3 The Domain The domain of our implementation is the “Shattered Worlds” Multi-User Dungeon (MUD), which is a text-based virtual reality game where players compete for limited resources in an attempt to achieve various goals. As stated in Section 1, the MUD has over 4,700 locations, more than 7,200 actions, and 20 different quests (goals). The plan recognition problem is further exacerbated by the presence of spelling mistakes, newly defined commands and abbreviations for commands. The MUD also has reactive agents controlled by the system (non-player characters), and contains a number of items which may be acquired and used by characters in order to achieve some effect within the game. Despite the fact that the MUD is a game, only a minority of the players log-in to play. Many users log-in with other goals, such as socialising with other players, crashing the MUD, or engaging in socially aberrant behaviour. However, at this stage of our project, we are only interested in recognising one type of goal, namely quests. Examples of the simplest quests in the MUD are the “Teddy-bear rescue”, which involves locating and retrieving a teddy bear lost by a non-player character called Jane, and “Wood chop”, where a player must chop wood in the market place, after first acquiring an axe and eating food to obtain enough energy to carry out the wood-chopping task. More complex quests may involve solving non-trivial puzzles, interacting with various non-player characters, e.g., monsters, shopkeepers or mercenaries, or achieving a number of sub-goals, e.g., obtaining potions. Players usually know which quest or quests they wish to achieve, but they don’t always know which actions are required to complete a quest. In addition, they often engage in activities that are not related to the completion 2
A survey of research on learning belief networks is given by Heckerman (1995).
outcome.
368
D. W. Albrecht et al. Table 1. Sample data for the Avatar quest. Action No. 1 12 17 28 37 40 54 60 62
Time
Player
Location
Action
773335156 773335264 773335291 773335343 773335435 773335451 773335558 773335593 773335596
spillage spillage spillage spillage spillage spillage spillage spillage spillage
room/city/inn players/paladin/room/trading post players/paladin/room/western gate players/paladin/room/abby/guardhouse players/paladin/room/abby/stores players/paladin/room/shrine/Billy players/paladin/room/brooksmith players/paladin/room/shrine/Dredd players/paladin/room/abby/chamber
ENTERS buy bribe kill search worship give avenger Avatar quest
of a specific quest, such as chatting with other players or fighting with MUD agents. As a result, players typically perform between 25 and 500 actions until they complete a quest, even though only a fraction of these actions may actually be required to achieve a quest. Analysis of the MUD yields the following features:3 (1) it is not possible to obtain a perspicuous representation of the domain (for example in the MUD there is a vast number of actions whose effects and preconditions are not fully known); (2) there may be more than one way to achieve a goal; (3) some sequences of actions may lead to more than one eventual goal; (4) some actions leading to a goal may need to be performed in sequence, while other actions are orderindependent; (5) users may interleave the actions performed to achieve two or more goals or may perform actions that are not related to any domain goal (e.g., socialising); (6) the states of the system are only partially observable; (7) the plan inference mechanism obtains information mainly from a user’s keyboard commands, i.e., the mechanism has limited information about the user’s knowledge and ability; and (8) the outcome of the actions is uncertain, i.e., the performance of an action is not a sufficient condition for the achievement of the action’s intended effect (e.g., due to the presence of other agents who affect the states of the system). The MUD software collects the actions performed by each player and the quest instance each player completed. In the current implementation, each data point is composed of: (1) a time stamp, (2) the name of the player, (3) the number of the login session, (4) the location where the action was executed, and (5) the name of the action. A DBN is then constructed on the basis of the collected data as described in Section 4. Table 1 illustrates some of the 62 actions performed by a player to achieve the Avatar quest (the number of the login session is not shown).4 Without domain knowledge, it is extremely difficult to determine by inspection which of these actions (if any) are necessary to complete the quest, the order of the necessary actions, or whether an action had the intended
4 The Model In this section we identify the interesting domain variables. We show how to represent their dependencies, and how they change over time using a belief network representation. 3 4
Other domains which we intend to investigate, viz WWW and Unix, have most of these features. At present, the MUD software does not record keyboard commands regarding an agent’s movements on the horizontal plane, i.e., North, South, East and West. In addition, only the first word of each command is considered during training and testing.
Towards a Bayesian Model for Keyhole Plan Recognition in Large Domains
369
4.1 Belief Networks Belief (or Bayesian) networks (Pearl, 1988) have become a popular representation for reasoning under uncertainty, as they integrate a graphical representation of causal relationships with a sound Bayesian foundation. Belief networks are directed acyclic graphs where nodes correspond to random variables. The relationship between any set of state variables can be specified by a joint probability distribution. The nodes in the network are connected by directed arcs, which may be thought of as causal or influence links; a node is influenced by its parents. The connections also specify independence assumptions between nodes, which allow the joint probability distribution of all the state variables to be specified by exponentially fewer probability values than the full joint distribution. A conditional probability distribution (CPD) is associated with each node. The CPD gives the probability of each node value for all combinations of the values of its parent nodes. The probability distribution for a node with no predecessors is its prior distribution. Given these priors and the CPDs, we can compute posterior probability distributions for all the nodes in a network, which represent beliefs about the values of these nodes. Observation of specific values for nodes is called evidence. Beliefs are updated by re-computing the posterior probability distributions given the evidence. Belief networks have been used in various applications which initially were static, i.e., the nodes and links do not change over time. These applications involve determining the structure of the network; supplying the prior probabilities for root nodes and conditional probabilities for other nodes; adding or retracting evidence about nodes; and repeating the belief updating algorithm for each change in evidence. More recently, researchers have used belief networks in dynamic domains, where the world changes and the focus is on reasoning over time (Dean and Wellman, 1991, Dagum et al., 1992, Nicholson and Brady, 1994). Such dynamic applications include the automated vehicle control (Forbes et al., 1995) and traffic plan recognition (Pynadath and Wellman, 1995) described in Section 2. In such applications the network grows over time, as the state of each domain variable at different times is represented by a series of nodes. Typically, for these dynamic networks, the connections over time are Markovian, and a temporal ‘window’ is imposed to constrain the state space to some extent. Such networks provide a more compact representation than the equivalent Hidden Markov Model (Russell et al., 1995). 4.2 Network Nodes and Structure Based on the data we have for our domain, the domain variables, which are represented as nodes in the belief network, are as follows: Action (A): This variable represents the possible actions a player may take in the MUD, which we take to be the first string of non-blank characters entered by a user, plus the special other action, which includes all previously unseen actions. For the results given in this paper, the state space size, jAj, is 7259. Location (L): This variable represents the possible locations of a player, plus the special other location, which includes all previously unseen locations. For the results given in this paper, the state space size, jLj, is 4722.5 5
Future work includes using the hierarchical structure of the location data (Section 6).
370
D. W. Albrecht et al. A 0
A 1
A 2
A 3
L 2
L 3
ACTION
QUEST
Q
Q’
LOCATION L 0
L 1
Figure 1. Dynamic Belief Network for the MUD.
Quest (Q): This variable represents the 22 different quests a player may undertake, including the other quest, which includes all previously unseen quests, and the null quest. The variable representing the previous quest achieved is set to null if the user has just started a session. A simple dynamic belief network structure for the domain is shown in Figure 1. This network is not a pure dynamic belief network; the changes in the action and location over time are represented, but it is assumed that a player’s current quest does not change. Our model makes minimal assumptions about the dependencies between the domain variables. The action and location variables, Ai and Li , at the ith time step depend on the current quest being undertaken and the previous action and location, respectively.6 The current quest, Q0 , depends on the previous quest, Q. 4.3 Probabilities and Belief Updating The CPDs are constructed from the collected MUD data as follows. The data is pre-processed to take the following form: Previous Quest null
Current Quest teddy
Current Action scream
Current Next Next Location Action Location room/sewer/sewer20 u room/city/alley1
A frequency count is maintained for each entry in the CPD that is observed. In order to account for the possible actions, locations and quests that do not occur in the training data, we adjust the frequencies so that the resulting CPD includes some probability that the other value may occur. This adjustment consists of adding a small number that corresponds to Good’s flattening constant (Good, 1965) or Heckerman’s fractional updating (Heckerman, 1995). A factor of 0.5 was used for the results obtained in this paper (Wallace and Freeman, 1987). The frequencies are then converted into the CPD. Once the DBN is constructed, new data from a user is added to the network as evidence, and belief updating is performed to give predictions for that user’s next action and location, and to 6
In Section 6 we describe future work with variations of this simple model.
Towards a Bayesian Model for Keyhole Plan Recognition in Large Domains
371
1. Receive initial data: PreviousQuest, NullAction, NullLocation. 2. Add data as evidence for nodes Q, A0 and L0 . 3. Perform belief updating on nodes Q0 , A1 and L1 . 4. Loop from n until quest is achieved 4.1 Receive new data: Action, Location. 4.2 Add data as evidence for nodes An and Ln . 4.3 Perform belief updating on nodes Q0 , An+1 and Ln+1 . 4.4 n n .
=1
= +1
Figure 2. Belief updating algorithm.
update the belief as to the current quest being undertaken. The evidence nodes for the domain at time-step n + 1 are: the last completed quest, Q, the previous actions, A0 ; : : : ; An , and the previous locations, L0 ; : : : ; Ln . The belief updating algorithm is given in Figure 2. Belief propagation for singly-connected networks can be done efficiently using a message passing algorithm (Pearl, 1988). When networks are multiply-connected (i.e., when there is a loop in the underlying undirected graph), simple belief propagation is not possible; informally, this is because we can no longer be sure that evidence has not already been counted at a node having arrived via another route. In such cases, inference algorithms based on clustering, conditioning or stochastic simulation may be used. Although there are underlying loops in the network structure shown in Figure 1, further analysis of the structure, together with the location of the evidence nodes, identifies d-separations (Pearl, 1988), indicating that certain nodes are conditionally independent. Using these independence relations, we simplify the belief update equations for the first time step:7 Pr(L1 = l1 jq; a0 ; l0 ) = q0 Pr(L1 = l1 jl0 ; q 0 )Pr(q 0 jq ); Pr(A1 = a1 jq; a0 ; l0 ) = q0 Pr(A1 = a1 ja0 ; q 0 )Pr(q 0 jq ); Pr(Q0 = q 0 jq; a0 ; l0 ) = Pr(Q0 = q 0 jq ): For step n + 1 we have Pr(Ln+1 = ln+1 jq; a0 ; l0 ; : : : ; an ; ln ) = q0 Pr(Ln+1 = ln+1 jln ; q 0 )Pr(q 0 jq; a0 ; l0 ; : : : ; an ; ln ); Pr(An+1 = an+1 jq; a0 ; l0 ; : : : ; an ; ln ) = q0 Pr(An+1 = an+1 jan ; q 0 )Pr(q 0 jq; a0 ; l0 ; : : : ; an ; ln ); Pr(Q0 = q 0 jq; a0 ; l0 ; : : : ; an+1 ; ln+1 ) = Pr(ln+1 jln ; q 0 )Pr(an+1 jan ; q 0 )Pr(Q0 = q 0 jq; a0 ; l0 ; : : : ; an ; ln ); where is a normalizing factor.
P P
P P
5 Experimental Results Methodology. A run is a sequence of action-location pairs, beginning either after a player enters the MUD or after a player completes a previous quest, and ending when a new quest is achieved. A certain percentage of the 4,981 runs in our corpus, chosen randomly, is used for training, and the remaining runs are used for testing. 8 All results presented in this section are for 80% training and 20% testing, except where otherwise indicated. During each test run, we used the belief updating algorithm shown in Figure 2. 7 8
Details on the simplification of these formulae are given by Albrecht et al. (1997). During the testing phase, a value which was not seen in the training data gets classified as other. A prediction of other is always considered incorrect.
372
D. W. Albrecht et al. Avatar quest - Spillage character.
0.6 0.4 0.2
10
20
30
40
(a) Number of Steps
50
60
1 Prediction for Next Action
0.8
0
Avatar quest - Spillage character.
1 Prediction for Next Location
Predicted probability of Achieved Quest
Avatar quest - Spillage character. 1
0.8 0.6 0.4 0.2 0
10
20
30
40
(b) Number of Steps
50
60
0.8 0.6 0.4 0.2 0
10
20
30
40
50
60
(c) Number of Steps
Figure 3. Predictions for spillage (80% training): (a) quest, (b) location, and (c) action. In (b) and (c), the solid lines represent the probability of next action/location, and the dashed lines the ratio of this probability to the probability of the most likely action/location prediction.
A single run. The output for the sample test run where the character spillage achieves the Avatar quest (Table 1) is shown in the graphs in Figure 3(a)-(c). The x-axes for these graphs show the number of time steps in the DBN, which correspond to the number of actions performed by the user. The y-axes show the current beliefs for the user’s current quest (Q0 ), next location (Ln+1 ), and next action (An+1 ), respectively. Figure 3(a) shows that initially the system predicts a nearly zero probability that the Avatar quest is being attempted. This reflects the prior probability that the Avatar quest follows the null quest; the CPD entry for Pr(Q0 = AvatarjQ = null) is 0.04985. The predicted probability begins to rise after about 10 steps, becoming close to 1 around step 15, and remaining there until the quest is completed in step 62. The shape of this graph is typical of the more successful output runs. Less successful runs take longer for the prediction to increase (Figure 4(a,d)), exhibit more fluctuations (Figure 4(b,c,f)), and a small percentage of the runs fail to predict the quest being attempted (Figure 4(e)). The absolute probabilities for the next-location and next-action predictions (bottom curve of the graphs in Figure 3(b,c)) are not as high as those for the next-quest prediction. This is to be expected in light of the large number of possible actions and locations. However, the quantities of interest are the ratios of the predicted probabilities of the actual location and action to the maximum predicted probabilities of any location and action, respectively. These ratios are represented by the top curve of the graphs in Figure 3(b,c)). From these curves it is quite clear that for the vast majority of the Avatar quest, the locations visited by the player are those predicted by our model with the highest probability. Our action predictions are less successful than our location predictions. Nonetheless, in a substantial part of the quest our model assigns the maximum probability to the action that is actually performed by the player. Ranking of candidate quests. The system maintains beliefs about each quest the user may attempt. We rank these beliefs and report on the ranking of the actual quest the user achieves in a given run. Figure 5(a) shows the percentage of runs where the actual quest was predicted in the top N quests, where N = 1 to 3. In order to compare across runs where the number of steps (actions recorded) varies, the x-axis is the percentage of actions taken to complete a quest. The y-axis is the percentage of runs. We assess the quality of these results by comparing them to the first-
Predicted Probability of Achieved Quest
Predicted Probability of Achieved Quest
Towards a Bayesian Model for Keyhole Plan Recognition in Large Domains (a) Go quest - Gargron character.
(b) Go quest - Panawe character.
(c) Orc slaying quest - Zshaka character.
1
1
1
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0
0 0
10
20
30
40
50
60
70
80
90 100
0 0
(d) Wood chopping quest - Thecrow character.
10
20
30
40
50
60
70
80
90
0
(e) Teddy bear quest - Whisper character. 1
1
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0 0
20
40
60
80
100
Number of Steps
120
140
20
40
60
80
100
120
(f) Go quest - Bubbles character.
1
0
373
0 0
20
40
60
80
Number of Steps
100
120
0
10
20
30
40
50
60
70
80
90
Number of Steps
Figure 4. Typical quest prediction curves based on 80% training.
order Markov prediction obtained using only the previous quest; the horizontal lines in the graph show the prediction based purely on the frequency with which the actual quest was achieved given the previous quest. As N increases, the percentage of runs is higher, since N = i + 1 subsumes N = i. For each value of N , the predictions made by our model quickly rise above the Markov prediction, and continue to improve as quest completion progresses. Table 2 shows the percentage of correctly predicted quests at different stages prior to quest completion. For instance, when 80% of a quest has been completed, we are correctly predicting the quest being attempted in 74.84% of the runs; this quest is in the top two predicted quests in 81.41% of the runs, and in the top three predicted quests in 84.13% of the runs. Varying the size of the training set. The final experimental results show the effect of varying the size of the training set on the predictive power of our DBN model. Figure 5(b) shows the effect of training with 5%, 20% and 80% of the data. As expected, 5% training produces the worst results. As the size of the training set increases, the predictions improve. Note that the results do not change substantially between a training set comprising 20% of the data and one comprising 80%.
6 Discussion and Future Work We have described a Dynamic Bayesian Network which predicts a user’s next location, action and quest based on a training corpus. The structure of the network itself is fairly simple, but the number of possible values of each node makes its training and evaluation a computationally complex task. As indicated in Section 2, we do not learn the structure of the network. This aspect
374
D. W. Albrecht et al. (a) DBN model vs Quest Markov 100
(b) Quest Predictions 100
N=3 N=2 N=1
80% 20% 80 Percentage of runs
Percentage of runs
80 60 40 20
5%
60 40 20
0
0 0
20 40 60 80 100 Percentage of actions until quest completion
0
20 40 60 80 100 Percentage of actions until quest completion
Figure 5. (a) Top N quest predictions for 80% training data (the horizontal lines represent the Markov predictions); (b) Quest predictions with different training set sizes. Table 2. Percentage of runs where the eventual quest is in the top N quests at X % of quest completion. Prediction in top quest top 2 quests top 3 quests
Percentage of quest completion 70% 80% 90% 95% 100% 69.07 74.84 78.69 83.65 89.10 77.56 81.41 85.74 89.26 95.35 81.25 84.13 88.30 91.99 96.63
of our approach is domain dependent. However, our approach is sufficiently general to support additional domains, such as the WWW and Unix, which have similar features to those of the MUD. An important feature of our model is that, due to its probabilistic training, its predictions are based on actions that are normally performed to achieve a goal, rather than on actions that necessarily advance a user towards the achievement of a goal. This means that actions that are necessary to achieve a goal, and are therefore performed by a large number of users, have a large effect on the predictions. However, the performance of a few extraneous actions does not preclude the correct prediction of a user’s goal. As seen in the previous section, the results obtained with this network are encouraging. However, these results were obtained under certain user-related and domain-related simplifying assumptions. Examples of the former are: all users complete a quest, all users have similar profiles, and all users attempt one quest at a time. Among the latter we have: the domain has certain independence relations, and only certain types of data are available. In the future we intend to extend our mechanism so that it can handle the relaxation of these assumptions. The first two assumptions will be relaxed simultaneously by including non-quest runs into our observations, and using a classification mechanism to build user profiles which reflect the kinds of activities performed by different types of users. A Dynamic Belief Network which incorporates a user’s class will then be built and trained from this data. The plan recognition task will involve the identification of a user’s profile on the basis of his/her current actions, and the prediction of the actions, locations and objectives of this user in the context of the identified
Towards a Bayesian Model for Keyhole Plan Recognition in Large Domains
375
profile. The relaxation of the third assumption requires the extension of our mechanism so that it can handle conjunctive goals. In order to relax the first domain-related assumption, we intend to investigate higher order models and networks with different connectivity; e.g., in the current model, actions and locations are only connected through quests (Figure 1). Establishing a link from Li to Ai would reflect the influence of a location on the possible actions that can be performed in it, but at the same time increase the complexity of the belief updating process. In addition, we intend to consider the hierarchical structure of the location data, i.e., the fact that certain locations are part of the market place, others are part of the inn, etc. The consideration of this factor makes our model more domain dependent. However, this drawback may be offset by an increased accuracy in next-location predictions. Finally, in order to relax the second domain-related assumption, we have recently started collecting additional data to those originally provided at the beginning of this research, e.g., horizontal movements and health and wealth of the players. The availability of these data will allow us to develop more detailed models, and to test them against the baseline results obtained with our current model.
References Albrecht, D.W., Nicholson, A.E., Zukerman, I., and Bud, A. (1997). A Bayesian model for plan recognition in large, complex domains. Technical report, Department of Computer Science, Monash University, Victoria, Australia. Allen, J.F., and Perrault, C. (1980). Analyzing intention in utterances. Artificial Intelligence 15:143–178. Bauer, M. (1996). Acquisition of user preferences for plan recognition. In UM96 – Proceedings of the Fifth International Conference on User Modeling, 105–112. Ca˜namero, D., Delannoy, J., and Kodratoff, Y. (1992). Building explanations in a plan recognition system for decision support. In ECAI92 Workshop on Improving the Use of Knowledge-Based Systems with Explanations, 35–45. Carberry, S. (1990). Incorporating default inferences into plan recognition. In AAAI90 – Proceedings of the Eight National Conference on Artificial Intelligence, 471–478. Charniak, E., and Goldman, R.P. (1993). A Bayesian model of plan recognition. Artificial Intelligence 64(1):50–56. Dagum, P., Galper, A., and Horvitz, E. (1992). Dynamic network models for forecasting. In UAI92 – Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence, 41–48. Dean, T., and Wellman, M. P. (1991). Planning and Control. San Mateo, California: Morgan Kaufmann. Forbes, J., Huang, T., Kanazawa, K., and Russell, S. (1995). The BATmobile: Towards a Bayesian automated taxi. In IJCAI95 – Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1878–1885. Good, I.J. (1965). The estimation of probabilities: An essay on modern Bayesian methods. Research Monograph No. 30. MIT Press, Cambridge, Massachusetts. Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research. Huber, M.J., Durfee, E.H., and Wellman, M.P. (1994). The automated mapping of plans for plan recognition. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, 344–350. Lesh, N., and Etzioni, O. (1995). A sound and fast goal recognizer. In IJCAI95 – Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1704–1710.
376
D. W. Albrecht et al.
Lesh, N., and Etzioni, O. (1996). Scaling up plan recognition using version spaces and virtual plan libraries. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, Washington. Litman, D., and Allen, J.F. (1987). A plan recognition model for subdialogues in conversation. Cognitive Science 11:163–200. Nicholson, A.E., and Brady, J.M. (1994). Dynamic belief networks for discrete monitoring. IEEE Transactions on Systems, Man and Cybernetics 24(11):1593-1610. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo, California: Morgan Kaufmann. Pynadath, D., and Wellman, M. (1995). Accounting for context in plan recognition with application to traffic monitoring. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 472–481. Quinlan, J.R. (1983). Inferno: A cautious approach to uncertain inference. The Computer Journal 26:255– 69. Raskutti, B. (1993). Handling Uncertainty during Plan Recognition for Response Generation. PhD thesis, Monash University, Victoria, Australia. Raskutti, B., and Zukerman, I. (1991). Generation and selection of likely interpretations during plan recognition. User Modeling and User Adapted Interaction 1(4):323–353. Russell, S., Binder, J., Koller, D., and Kanazawa, K. (1995). Local learning in probabilistic networks with hidden variables. In IJCAI95 – Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1146–1152. Wærn, A., and Stenborg, O. (1995). Recognizing the plans of a replanning user. In Proceedings of the IJCAI-95 Workshop on The Next Generation of Plan Recognition Systems: Challenges for and Insight from Related Areas of AI, 113–118. Wallace, C., and Freeman, P. (1987). Estimation and inference by compact coding. Journal of the Royal Statistical Society (Series B) 49:240–252.