Towards an Integrated Approach for Task Modeling and ... - CiteSeerX

1 downloads 0 Views 208KB Size Report
opportunity of proactive assistance: If the devices in the user's environment ..... D., Kautz, H.A.: Opportunity knocks: A system to provide cognitive assistance with.
Towards an Integrated Approach for Task Modeling and Human Behavior Recognition Martin Giersich, Peter Forbrig, Georg Fuchs, Thomas Kirste, Daniel Reichart, and Heidrun Schumann Rostock University, Computer Science Department, Albert-Einstein-Str. 21, 18051 Rostock, Germany {martin.giersich | peter.forbrig | georg.fuchs | thomas.kirste | daniel.reichart | heidrun.schumann}@uni-rotock.de

Abstract. Mobile and ubiquitous systems require task models for addressing the challenges of adaptivity and situation-aware assistance. Today, both challenges are seen as separate issues in system development, addressed by different modeling concepts. We propose an approach for a unified modeling concept that uses annotated hierarchical task trees for synthesizing models for both areas from a common basic description.

Keywords. Task models, human behavior models, dynamic Bayesian networks, user interface design methodology, ubiquitous computing.

1

Introduction

Mobility and ubiquity of computing devices make information technology accessible for user activities that are temporally and, especially, spatially distributed. (We will use the term “task” for such a composite activity.) Computing devices are used “beyond the desktop”, in situations, where the user focuses on interacting with the real world (e.g., while repairing a machine, while giving a lecture in a smart classroom, during a team meeting, . . . ). In such situations, we wish for systems that are able to assist the user proactively: systems that do not force the user to interrupt his primary task and shift his focus of attention in order to interact with the system. One example for such proactive assistance are smart environments – smart homes, smart offices, smart meeting rooms, etc. – that support users in their activities through appropriate actuators. For instance, consider a “situation room” or “mission control center” scenario where several users are engaged in cooperative, time-critical decision making (such as disaster mitigation etc.). Here, wall displays may be used alongside standard computer monitors, and even laptops brought in or removed by staff members at arbitrary times. Different specialists on the team require information on different aspects of a problem, or even on different data altogether. A smart room would provide support for these conflicting and dynamically changing information needs by automatically deciding

what data to present where, when, and how. This decision is based on the current task at hand (team intention), the capability of the available hardware, and the preferences/needs of the team member(s) interested in them. From the viewpoint of user interface development, the consideration of mobile and ubiquitous application scenarios has two important consequences: 1. The set of devices available for interaction may change over time. This raises the challenge of adaptivity: on different devices, the same abstract interaction (“enter phone number”) has to be rendered differently in order to make optimal use of the current device’s interaction mechanisms – consider for instance keyboard entry vs. speech recognition. 2. The system becomes aware of a user’s task and its structure. This creates the opportunity of proactive assistance: If the devices in the user’s environment are able to infer her current activity, they are able to trigger actions (such as providing information), without explicit user interaction. In order to address these consequences, the same basic concept is being investigated in current research: the system is provided with an explicit model of the user’s task, a task model. However, the specific kinds of task models used for addressing the above two challenges differ significantly. In the area of adaptive user interfaces, hierarchical task graphs are used. Tasks can be refined by sub-tasks and their ordering constraints. In the area of proactive assistance, probabilistic models – such as hidden Markov models – are employed for describing human behavior, by specifying the probability of different possible actions a user may take in a certain situation. Consequently, in both areas models are currently developed independently. Intuitively, one would assume that a model which specifies the temporal orderings of subtasks should have some relation to the model that specifies what a user will probably do next. So it might be possible to generate both models (or at least, “templates” for both models) from the same basic description. It is this intuitively plausible idea we want to discuss in this paper. The further structure of this paper is as follows: in Sec. 2, we provide an indepth discussion of task models in general and hierarchical task models specifically. In Sec. 3, we outline activity recognition using probabilistic models of human behavior. Then we describe a simple strategy for synthesizing parts of such probabilistic models from extended hierarchical task models in Sec. 4. We discuss the viability of this approach and indicate further research in Sec. 5.

2

Modeling Tasks

By a task model we mean a breakdown of a composite activity into individual atomic steps, between which a partial order may be defined, roughly speaking: a “plan”. The term “action” will denote an atomic step of a task. The concept of task models originates from two research areas:

In cognitive psychology, task models have been developed as means for formally describing human problem solving behavior. The well known GOMS model [1] is a very good example for this class of models. It is the foundation of several proposals for model-based user interface design (for instance, [2]). These models can be used in two ways: (i) for analyzing the cognitive complexity of a given user interface, and (ii) for designing user interfaces by first developing a model of the task at hand and then choosing appropriate dialogue elements for the individual atomic activities. In signal processing, “task models” have been developed as a means for estimating the actual behavior of a signal source, for which only incomplete and noisy observations are available.The fundamental algorithmic approach is Bayesian filtering. A Bayesian filter requires a hypothesis about a signal source’s behavior repertoire, a hypothesis about which behavior will cause what observation, and a set of noisy observations. Based on this information, the filter will yield the most probable explanation for the observed data – i.e., the most probable behavior of the signal source given the observations (see e.g. [3] for an introduction to Bayesian filtering.). The relation between the two origins for task models becomes clear once we try to track a human user: The observations may be data from (noisy) location sensors (e.g., Beacon-based, GPS, UbiSense), accelerometers attached to the user’s body (or her mobile phone), or information about objects touched by the user (using, e.g., RFID-Tags). The challenge is then to infer which task the user is trying to perform from a given set of sensor data. Clearly, a model correctly describing a human’s strategy for achieving a certain goal is an ideal hypothesis for a Bayesian filter: given a task model and a set of sensor readings, a Bayesian filter will output the user’s most probable goal. In essence, from a viewpoint of mobile and ubiquitous computing, task models have two important uses: – As a means for deriving the dialogue structure of a mobile human computer interface (hierarchical tasks models). – As a means for providing activity support for users (and teams) through proactive assistance (probabilistic behavior models). With respect to hierarchical task models, one of the most popular notations is the so called ConcurTaskTree-notation (CTT-notation) [2]. In this notation, a compound activity is represented by a task-tree. Each node in the tree represents a task; composite tasks may be broken down into subtasks. For each task node it may be specified if this activity executed by the user, by the application, or by an interaction between user and application. In addition, the possible execution sequences of a composite task’s sibling nodes may be further constrained by temporal relations such as “α |=| β” (α and β may be executed in any sequence) or “α >> β (α has to be executed before β). Fig. 1 presents a typical task tree, describing in a rather simplistic way the agenda of a meeting that consists of three talks by users a, b, c (represented by task nodes A, B, and C, respectively) and a discussion (task node D). The talks can be given in any order, which is specified by the temporal relation order

A

|=|

B

|=|

C

>>

D

Fig. 1. Task model specifying the schedule of a meeting.

independency (“|=|”). The discussion can only be performed after all talks were presented. This is specified by the enabling relation (“>>”). Hierarchical task models are used to specify the behavior of users interacting with a software system. They allow to describe the basic temporal structure of compound activities. For inferring the activity of a user from sensor data, we need additional information: a specification of how probable a certain execution sequence is. Next, we will look at a current approach to this problem.

3

Inferring Intentions

As outlined above, computing the user’s current activity from sensor data requires a task model that allows to make statements about the plausibility of sensor data given a specific activity. A system can then try to identify the user’s current task by selecting that task, whose action sequence is most plausible with respect to the observed sensor data. Bayesian Filtering for identifying a user’s current task has been successfully used in several projects that aim at supporting user activities in classrooms, meeting rooms, and office environments [4–6]. Here, dynamic Bayesian networks (DBNs) are investigated increasingly for modeling a user’s activities [7, 8]. In our own work, we look at using DBNs for inferring the current task and actions of a team of users. Given (noisy and intermittent) sensor readings of the team members’ positions in a meeting room, we are interested in inferring the team’s current objective – such as having a presentation delivered by a specific team member, a moderated brainstorming, a round table discussion, a break, or the end of the meeting. The basic structure of the DBN we propose for modeling the activities of such a team is given in Fig. 2. In general, a DBN consists of a sequence of time slices, where each time slice describes the possible state of a system at a given time t. A time slice consists of a set of nodes that represent the system’s state variables at that time. State variables may be connected through directed causal links. A connection such as X → Y means that the current value of Y depends on the current value of X . This dependency is described by a conditional probability table (CPT), such as P (Y = 0 | X ) P (Y = 1 | X )

X =0X =1 0.9 0.3 0.1 0.7

slice t − 1

slice t

Tt−1

Tt

(c)

(c)

Ut−1

(c)

Gt−1

(b) Ut−1 (a)

(a)

Ut−1

Gt−1

(b) Gt

(a)

(a)

Ut

Gt (c) At

(b)

(b)

At−1

At

(a)

(a)

At−1

At (c) St−1

(c)

St

(b) St−1 (a)

Gt

(b) Ut

(c) At−1

St−1

(c)

Ut

(b) Gt−1

(b) St (a)

St

Fig. 2. Two-sliced dynamic Bayesian network (DBN) modeling team intention inference. It shows the intra-slice dependencies between observable (double-contoured) and hidden variables, as well as the inter-slice dependencies between consecutive states.

which in this example says that, in case X is 1, the value of Y will be 0 with a probability of 0.3 and it will be 1 with a probability of 0.7. (If X is 0, Y will be 1 with a probability of 0.1 and 0 with a probability of 0.9.) Causal links may connect nodes within a time slice, they may also connect nodes between time slices – the latter is used to express the fact that the state at time t depends on the previous state at time t − 1. (We here consider only DBNs that are first-order Markovian – i.e., where the state t depends only on state t − 1 and no earlier states.) The classical problem of a Bayesian network in general and for DBNs specifically is that not all node values may be known at a given time. Some nodes may be directly observable, but other nodes may be hidden. Bayesian inference then tries to infer the probability distribution over the hidden nodes’ values from the values of the known (observable) nodes. In our network given in Fig. 2, only the nodes labeled S are observable (they represent sensor data reporting the position of a user), the other ones are hidden. With this network, we try to model the behavior of a team of three users during a meeting. At the top level, the Team Node Tt represents the current team intention. The team’s intention at time t depends on what the team has already achieved (T at time t − 1, Tt−1 ), and what the users i are currently (i) trying to achieve (the Ut -nodes, i ∈ {a, b, c}). If all users have achieved their individual assignment for the current team intention, the team T will adopt a (i) new intention. This may cause new assignments to the users. The Gt nodes represent these – possibly new – assignments. So at each time slice, the team looks at what the users have achieved so far and then decides what the users

should do next. The CPT of node T therefore represents the negotiation process by which the team members agree on the next joint activity. For instance, if the team decides that the next activity should be the presentation of user a, it would assign to user a the task to go to the speaker stand and deliver his speech, while users b and c would be assigned the task to take a seat in the audience. (i) Whether a user i has achieved his assignment at time t – given by Ut – (i) (i) depends on the user’s current action (At ) and its previous assignment Gt−1 . (i) The At nodes record the current state of the user’s action (e.g., the user’s current position and velocity in case he has to reach the speaker stand in order to achieve his assignment). What the user is doing at time t depends on his previous (i) (i) action and assignment – At−1 and Gt−1 . Finally, the sensor observations of user (i) i at time t – the nodes St – depend on the user’s activities at that time. Note that these sensor nodes are the only observable nodes in our model: we cannot directly look into the minds of the users to observe their joint intention. (i) Rather, we take the available sensor data – the set of St values for the times up to t – and try from these to find the sequence of values for Ts , s ∈ {1 . . . t} that best explains the observed data – we try to estimate the team’s negotiations from the observable behavior of the team members. Once a probabilistic model is available, it allows us to infer user and team intentions. A substantial question is now of course: Where do we get such a model? Is it necessary to create it manually, from scratch, or can we synthesize it from existing information – such as from a hierarchical task model? This question will be addressed in the next section.

4

Synthesizing Probabilistic Models

In order to define a complete probabilistic model, sub-models have to be provided for the following three aspects: – How a team produces a sequence of joint intentions (Team model) – Which actions a user performs in response to a joint intention, an assignment (User model) – Which sensor data are caused by what actions (Sensor model) We will now look at the first topic and we will discuss how task models such as task-trees can be used to simplify the definition of such a model. First, the CPT of a T node basically looks as follows: h Tt−1 .history α Tt−1 .activity (i) (i) (i) Ut .done, i ∈ {a, b, c} ∀ i : Ut .done = true ∃ i : Ut .done = false P (Tt .history = h ∪ {α}) 1 0 P (Tt .history = h) 0 1 P (Tt .activity = α) 0 1 P (Tt .activity = ξ), ξ 6= α mmodel (Tt .history, ξ) 0

{A}

.09

{A, B} 1

.9



.9 .1 .99

{B}

.01

.01

{A, C}

1

{A, B, C}

1

{A, B, C, D}

1 .91

{C}

.09

{B, C}

Fig. 3. Markov model of agenda driven team process

The history slot of the T node records the team’s previous activities1 , the activity is the team’s current goal that the users try to jointly achieve through their individual assignments. If all users are done with their assignment, the team will add the current action to its history and it will then choose a new activity ξ. Otherwise, it will continue its current activity α. In this CPT, mmodel is the essential point. mmodel is a function that, given a history h and an action ξ, will yield the probability that the team will try ξ after h. mmodel describes our knowledge about what a team will most probably do in a certain situation. For instance, if the possible actions of a team are {A, B, C, D} and if we know that the team has an agenda stating the sequence of actions hA, B, C, Di, mmodel should assign the highest probability to action B when given the history {A} – modeling the prejudice that a team tends to follow its agenda. However, mmodel should also assign non-zero probabilities to the other actions in order to account for the possibility of deviations from an agenda. A possible model for a simple four-step agenda that states “A, B, C may come in any order but most probably in the order hA, B, Ci, while D must be the last action.” is given in Fig. 3. mmodel essentially specifies a Markov model where the states are partial histories and the edges are transitions between histories. The problem here is how such a model can be specified efficiently: the number of states (histories) grows exponentially in the number of available actions! Our proposal for solving this problem is to utilize hierarchical task models for defining the structure and transition probabilities of a DBN. Specifically, we use an annotated CTT graph for generating an initial proposal for mmodel . Basically, a task model M defined over a set of actions A specifies a directed acyclic graph on possible execution histories h ∈ 2A , with the additional constraint that (h, h 0 ) ∈ M ⇒ ∃ α ∈ A : h ∪ {α} = h 0 . 1

Given a set of actions A, an execution history is a set of actions that already have been performed. The set of all execution histories is the power set of A, which we denote by 2A . (This model makes the simplifying assumption that the exact sequence of actions is not important – however, it is easy to change the history model to a sequence model.)

This means: in a task model M, a history h 0 directly results from a history h through the execution of a single action α. The empty history  is the root of this graph2 . For a given history h, the set C(h) denotes the set of actions that may directly follow this history. C(h) is defines as follows: C(h) = { α ∈ A | (h, h ∪ {α}) ∈ M } Clearly, the graph M directly represents the structure of a corresponding Markov model. At this point, the question to be addressed is how to provide initial proposals for the transition probabilities of this Markov model. The idea is to allow the developer of a task model to annotate a task-tree with additional information from which these initial proposals for the transition probabilities can be derived. One straightforward approach is to annotate each sibling action α with a “priority” prio(α), a number that indicates how important an early execution of this node is in relation to the other siblings – such as outlined in Fig. 4. For independently ordered tasks (order relation |=|), the priorities indicate the probabilities of being executed first. Then, for a given history h and a possible extension ξ ∈ C(h), the probability of a transition from h to h ∪ {ξ} is calculated from the priorities by: prio(ξ) P (h ∪ {ξ} | h) = X prio(α) α∈C(h)

The resulting Markov model for the hierarchical task model in Fig. 4 is the one shown in Fig. 3. In this example the probability that a meeting starts with the presentation of A first is 0.9. Accordingly, it is 0.09 for presentation B. The probability that a meeting starts with third talk C is 0.01. If the meeting has started with talk B, the probabilites for the following two possible transitions to {B, C} and {A, B} are given by prio(C) prio(A) ≈ 0.01, and ≈ 0.99. prio(A) + prio(C) prio(A) + prio(C) Note that the most probable path through the generated Markov model is indeed the one following the agenda:  → A → {A, B} → {A, B, C} → {A, B, C, D}. Also, if an action is taken out of order, the Markov model specifies that the team will try to return to the agenda: When the meeting has been started with B, the most probably following action will be to return to the planned sequence by executing A. So the generated Markov model represents the intuition behind the task-tree annotations. We have just shown that a proposal for a probabilistic model of user behavior can be generated from an annotated hierarchical task model. Therefore, 2

If histories are represented by sequences instead of sets, this graph is a tree.

A 90

|=|

B 9

|=|

C 1

>>

D

Fig. 4. Extended task model specifying the schedule of a meeting.

we claim that it is at least possible to exploit established user interface design methodology – task-tree modeling – for additionally enabling some aspects of proactive assistance in ubiquitous computing systems. Following, we will discuss some of the research issues that arise from this approach.

5

Discussion and Outlook

With the simple strategy outlined above, we have shown that it is possible to synthesize Markov models from annotated hierarchical task-trees that capture the probability of task execution in a team. An interesting question is now, how well the intended Markov model can be specified by the priority annotations. Not all possible distributions for the transition probabilities can be generated from these task-tree annotations (after all, the number of priority annotations is much smaller than the number of transitions in the Markov model). However, we think it is sufficient if the generated model is approximately correct: the exact transition probabilities are not known in advance, anyway. They have to be learned from the observation of real user behavior. The generated probabilities only have to be so exact as to permit a system a reasonable behavior right from the start, before training data is available (of course, the better the initial estimate, the less training data will be required). The salient point is a useful definition of “reasonable” and “approximate”. We think it is possible to provide such definitions; this will be part of our future research. The specific task-tree annotations and the accompanying probability computation given in Sec. 4 implicitly assume that the team uses a specific agenda management strategy. Specifically, the synthesized model assumes that a team prefers to execute actions in the order of their original priority, independent of the history. We call this a return to agenda strategy. Sometimes, teams might use other strategies. One example is to continue with the successor : if the meeting had started with talk B, the most probable next action would be C. This means the original “successor” of the action actually executed is also the most probable following activity. Another strategy of a team could be to stick to a timetable and to execute each action as close as possible to the original plan. Different strategies may require different annotations to a hierarchical task model. (For instance, in case of using continue with successor, one needs to

provide the priority annotations at the parent task level rather than at the sibling level.) In addition, it may be conceivable to provide a set of mechanisms for inheriting such annotations within a task-tree. In order to render task-tree annotations usable, the set of annotation mechanisms has to be kept as small as possible. Therefore, we need to identify a set of annotations that allows to specify the typical team strategies for agenda management with sufficient precision. Future research has to identify which strategies are typical, what precision is required, and which set of task-tree annotations is able to capture the required information in a usable way for real life problems. To summarize, we have shown that task models are an important tool for addressing salient challenges of mobile and ubiquitous system design: adaptivity and proactive assistance. The models employed today in these areas use quite different modeling concepts, which leads to duplication of work. We have then argued that it is possible to automatically synthesize one kind of models (probabilistic behavior models) by giving simple annotations to the other kind (hierarchical task models). Therefore, it seems possible to generate proactive assistance for every application that already provides a suitably elaborate task model. We think that further research in this area is important for evaluating the potential of this approach and for rendering its benefit accessible to ubiquitous system development practice.

References 1. Card, S.K., Moran, T.P., Newell, A.: The psychology of human-computer interaction. Erlbaum (1983) 2. Mori, G., Paterno, F., Santoro, C.: CTTE: support for developing and analyzing task models for interactive system design. IEEE Trans. Softw. Eng. 28(8) (2002) 797–813 3. Fox, D., Hightower, J., Liao, L., Schulz, D., Borriello, G.: Bayesian filtering for location estimation. IEEE Pervasive Computing 02(3) (2003) 24–33 4. Franklin, D., Budzik, J., Hammond, K.: Plan-based interfaces: keeping track of user tasks and acting to cooperate. In: Proceedings of the 7th international conference on Intelligent user interfaces, New York, NY, USA, ACM Press (2002) 79–86 5. Bui, H.: A general model for online probabilistic plan recognition. In: IJCAI ’03: Proceedings of the 18th International Joint Conference on Artificial Intelligence. (2003) 1309–1315 6. Duong, T.V., Bui, H.H., Phung, D.Q., Venkatesh, S.: Activity recognition and abnormality detection with the switching hidden semi-markov model. In: CVPR ’05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 1, Washington, DC, USA, IEEE Computer Society (2005) 838–845 7. Patterson, D.J., Liao, L., Fox, D., Kautz, H.A.: Inferring high-level behavior from low-level sensors. In: Proceedings of UBICOMP 2003: The Fifth International Conference on Ubiquitous Computing. (2003) 8. Patterson, D.J., Liao, L., Gajos, K., Collier, M., Livic, N., Olson, K., Wang, S., Fox, D., Kautz, H.A.: Opportunity knocks: A system to provide cognitive assistance with transportation services. In: Ubicomp, Springer (2004) 433–450