On Handling Conflicting Input in Context-Aware ... - Semantic Scholar

2 downloads 0 Views 170KB Size Report
Gerd Korteum, Zary Segall, and Martin Bauer. Context-aware, adaptive wear- able computers as remote interfaces to 'intelligent' environments. In ISWC '98:.
On Handling Conflicting Input in Context-Aware Applications Pascal Fleury, Jan Cuˇr´ın, Jan Kleindienst, and Robert Kessl IBM Voice Technologies, Prague, Czech Republic [email protected]

Abstract. To make the task simpler to the user, context-aware applications try to infer the current context of the task through a set of sensors. These sensors deliver information which is the basis for the context model helping the application to take appropriate actions. Up to recently, most of the context-aware systems and frameworks made the assumption of dealing with accurate sensor data. In this paper, we assume the sensor data to be faulty and unreliable, and investigate its implications. Dealing with unreliable sensory information may lead to contradictory outcomes, which we call a conflicting input. We present a taxonomy of such conflicts, and analyse the different phases involved in conflict resolution. With the help of some simulations with several statistical and rule-based inference toolkits, we illustrate the different facets of conflicts, and show how the different techniques can deal with such unreliable inputs.

1

Introduction

Context-aware applications are applications that implicitly detect information related to the task a particular user is trying to achieve. Such information, known as the context, is then used by the application to provide more adapted feedback or more specific actions. The context itself is inferred from a set of inputs of the system, both from the user herself and from other relevant sensors (time, location, temperature, etc.) Any system that deals with processing data from independent inputs will eventually encounter situations where that data are conflicting. In other words, some input channels report information that is clashing with the corresponding information provided by other input channels, or with the inferred model the system maintains. It is important to realize that such situations are not as exceptional as they may seem at first sight. On the contrary, dealing with inconsistencies means continuously adjusting the level of trust associated with each input channel. Thus dealing with input inconsistency is an inevitable part of the data fusion process, and it should be treated as such. As a matter of fact, in real systems, the challenge of blending noisy and unreliable data from different channels exhibits itself in all kinds of nuances and variants. If left unsolved, it may render the system quite

2

unusable. Hence we need to devise algorithms, techniques and strategies to deal with conflicting input to alleviate and resolve these inconsistencies. In this paper, we analyse conflicts in input data of context-aware applications. We believe that conflicts in a context-aware system, among inputs or among instances of inferred context, can be classified into a taxonomy. This taxonomy then helps us decomposing the process which lead to these conflicts. Let’s define the terms that will be used in the following discussion. The sensing infrastructure (also called perceptual component layer ), together with the context modeling infrastructure (a.k.a situation model ), are responsible for the recognition of situations that are happening in the physical settings being observed. Examples of situations are: someone entered a door, a speaker appeared at whiteboard, coffee break starts in the meeting. The recognized situations are used as one of the inputs for the application logic to perform certain tasks, e.g. display a professional record of the person who entered the door. This paper is subdivided in the following sections: first, we review related work in Section 2. In Section 3, we describe the different types of conflicts, and propose a taxonomy. In Section 4, we go over the different aspects of dealing with conflicts in a system. Then, in Sections 5 and 6, we describe some of our experiments in handling conflicting inputs in real systems with some preliminary results.

2

Related Work

Ubiquitous computing research, as described by Weiser [20], has lead to a number of context-aware systems and frameworks (See Baldauf’s paper [3] for an recent overview). These systems will collect information through sensors, deduce additional context information from sensor data, or through inference given a model of the universe being considered. Context has been notoriously difficult to define in an unambiguous way, and the most appropriate definition is probably that of Dey [7]: context is ”any information that can be used to characterize the situation entities (i.e. whether a person, place or object) that are considered relevant to the interaction between a user and an application, including the user and the application themselves”. A typical choice of context information description is using a subject–predicate–value representation, as is the case in Gu’s system [12] and Dey’s Context Toolkit [2]. Unfortunately, many of these early systems [5, 13, 17, 19] made the simplifying assumption that sensed data is correct. Among the first to consider dealing with imperfectly sensed data are Dey et al. [8, 9], who proposed a scheme for disambiguating the sensed information in requesting additional user input. Loke [18] proposed using the consequence of a particular action in a system’s decision to perform that particular action. He uses an argumentation approach, and every decision is weighted by the severity of the consequence of the action. Later, Gu et al. [11] proposed a Bayesian approach of dealing with context information, but more because it enables the representation of causality between events than because of their resilience to imperfectly sensed data. Biegel et al. [4] used Bayesian

3

networks to fuse information from several sensors, in an attempt to reduce the computational complexity of distributed context extraction when many inputs are present.

3

Taxonomy of conflicts

The basic assumption in most systems is that the sensors and perceptual components deliver reliable information, at known or regular time intervals. However, more often than not, the delivered information is very noisy. Using this raw information may result in false context inference. As examples for a meeting setup, as is the case in the CHIL project [1], here are a few of the inconsistencies that may occur in the system: 1. two body trackers each provide a different count of people in the room, e.g. one tracker reports the room being empty, while the other claims there are three people in the room; 2. a body tracker reports that there is no-one in the room while the acoustic environment detector reports the sound of human steps and clapping of hands; 3. speech recognizer claims there is silence in the room, while the acoustic speech detector reports human speech; 4. a body tracker reports a person leaving the room, while the situation model reports that the room is already being empty; 5. one meeting detector indicates the start of the meeting, while the other reports the meeting has not started yet. Cases 1-3 illustrate situations where two or more perceptual components send conflicting signals. Case 4 documents a situation where a perceptual component sends a signal that is inconsistent with the state of the situation model. Case 5 captures a case when two context interpreting components disagree on the outcome. Such differences on the current value of a particular stream of an entity is what we termed conflicting input. As the previous examples show, there is more than one class of conflicts. A classification of conditions that cause conflicts would thus help in conflict diagnosis and assist in their possible remedy. Such a classification will most likely suggest different mechanisms and strategies for handling different classes of error based on the conflict cause and its impact. Therefore, we propose the following taxonomy on the kinds of conflicting inputs a system may have to deal with. This taxonomy is shown in Table 1. A common property of all these types of failures is that they cannot be easily detected from the consumers of the data. This needs additional knowledge, which is usually available at semantically higher levels in the system. Therefore, the only way for a component of the system to know about the failure is through a feedback from the upper layer having detected the problem, because network

4 Type of failure Description Source A sensor is broken or lost connectivity, and no data is available; Data The reported value is completely out-of-range, presents abnormal variability, too little precision1 or is not updated frequently enough; Context The modeling, typically a statistical process, may render a particular hypothesis as the most probable one, but it does not match reality. Table 1. Taxonomy of conflicting inputs.

partitions are not advertised, broken sensors do not tell they are broken2 , detection of a byzantine failure of a sensors is tricky, and the wrongly modeled situation may only be detected in cross-correlating other aspects of the system.

4

Anatomy of handling conflicts

4.1

Conflicting situation detection

As for humans, a conflicting situation is first perceived, and then only do we react (or decide not to). We model conflicting input resolution as a similar twophases approach (see Figure 1): first, finding out that there is an actual conflict through analysis of the input data; second, modeling the context according to this situation analysis.

Media analysis engine

Context Modeling Layer

Fig. 1. Stages in conflict detection, having first an analysis phase, then a decision taking (or modeling) phase based on the output of the analysis phase.

It is quite central to conflict detection that such a conflict can only be detected if we have reasonable high trust that the sensed information should be otherwise. This truth can come from different origins: natural laws – Physical laws state that two object cannot occupy the same space, or that loud noise should be sensed as loud on two nearby microphones; natural limits state the maximum speed for human motion or temperature drift in a room; 2

Otherwise we could avoid easily disk crashes and data loss!

5

sensor consistency – Information strongly related to a category should have strong correlation across several modalities in sensors: speech activity detection should be strongly correlated to human body detection; general knowledge – In meetings, people generally talk in turns, which is not the case during the break; during a presentation, the lecturer is standing; lighting is changed while doing a projected presentation; historical behavior – Sensors have known patterns of reporting information; diverging from these patterns may be an indication of failure of the sensor. Detecting that there is a conflict can therefore be reduced to detecting a deviation from a expected behavior. This is similar to the detection of Byzantine faults in distributed systems [15], where the expected behavior is expressed as an algorithm, except that here, the expected behavior is also learned from the input. 4.2

Determining severity of conflict

One of our primary questions is how severe is a given observed conflict? As originally proposed by Loke [18], a conflict needs to be addressed only if the resulting system behavior, if wrong, has serious consequences. The extent to which the conflict resolution will actually improve the system’s behavior must also be assessed. How much the conflict affects the behavior of the system depends on the situation(s) the system is considering. On one hand, there may be situations where even a slight sensing error may cause a fatal malfunction of the system. On the other hand, certain tasks may be resilient to large faults and discrepancies in sensing and situation modeling. E.g., determining the level of attention in an audience can be done even if the sound channel is completely broken, whereas keeping track of a meeting with an unreliable face/speech recognizer may render the meeting minutes useless. In general, the volatility of the task to error decreases with the coarseness of the task. E.g., observing whether the crowd is applauding versus an individual person among the crowd is applauding. The severity of an action must therefore be known, either explicitly preset by user or evaluated. 4.3

Dealing with the Conflict

We defined conflict in input as disharmony between system’s hypothesis about the world (i.e., situation model) and observed reality. Correcting a conflict has two purposes. First, to resolve the current discrepancy so that the system behaves correctly. Second, to adjust the system so that when facing the same (or similar enough) situation again, it will not be trapped into the same undecided state. A typical assumption of such a system is that if its reaction is not contested, the conflict was resolved properly or its severity was minor. This can then be used as a positive sample.

6

For statistical models or context inference, such samples are then used to adapt the system to a new behavior, hopefully rendering more up to the task. The same information is used to update the different levels of trust associated with each the sensors.

5

Situation Modeling

Fig. 2. Situation modeling involves three layers , with entity events informing the situation model, and state change events informing the services.

The situation model is the layer transforming a set of facts about the environment into a set of situations [6]. As shown in Figure 2, the main information flows from the lower layers to the upper layers. Higher layers represent semantically higher abstractions in this model. The set of facts comes from the perceptual components, which collect and aggregate sensor information (from lower layers not shown in Figure 2) into entities. These entities are the base representation for the situation model. In the meeting scenario, the entities are the props (room, table, chair, whiteboard, PDA, etc.) and the people. Each entity may have a variable number of attributes, which describe the entity. Examples are Person ID and Location for a person in the meeting room, or Location and Heading for a (movable) whiteboard. For each of these entities, the perceptual components send a sequence of attribute update events, which we call streams. A stream is relative to a single attribute of a single entity. The situation model considers these entities as having a oneto-one mapping to the real world in the modeled environment. The situation model will use the attributes of the entities provided by the perceptual components, and infer the current state of a set of situations. Example of a situation is the Meeting, and its states may be Meeting/Presentation and Break (Section 6.1 for a more complete set). The situation model will then generate an event, indicating that a particular situation has changed its state. As the situation model is the layer doing most of the interpretation of sensory data and molding them into situations with a clear meaning, it is also the natural place to incorporate conflicting input detection and correction.

7

(a)

(b)

Fig. 3. Situation machines, present in the Situation Modeling layer, contain a state diagram (a) generating state change events. Several situation machines can be combined (b) to provide semantically higher information.

5.1

Situation Machines (SM)

The goal of the situation model is to extract semantically higher information from these facts. This is achieved through situation machines. Situation machine models a single situation which consists of a set of states, a set of attributes and events about changing states. A sample situation state diagram for a meeting is depicted in Figure 3(a). The different states indicate the current status of the meeting. A transition from a state to another will generate an event, and thus inform the higher layers (i.e. services or other SMs) about some change in the current system’s hypothesis on the meeting. A service could, then, use this to switch on the light during the break, start the projector and presentation when the meeting starts, or display the identity of the current lecturer on a nearby screen. At the same time, the situation machine presents its set of attributes through a function-call API (see Figure 2), so that the service may also, again in the meeting scenario, find out who the new lecturer is in case a state changes from Break to Meeting/Presentation. Situation machines may model some simpler aspects of the environment, and therefore build a hierarchy of contextual information, not unlike those presented in [4, 11]. As an example, in Figure 3(b), an attendance situation machine may simply track the number of people in the room. This attendance information can then be re-used by the meeting situation machine: it can infer the state of the meeting from the knowledge that there can be no meeting if the number of people is below 2.

8

6

Dealing with conflicting input: the experiment

Each situation machine can be implemented using different methods – heuristicbased or statistical-based. As an example, let’s take a situation machine responsible for detecting the current state of a meeting. Figure 3(a) shows the state diagram of such a Meeting SM. 6.1

Selection of Proper Statistical Methods

In the statistical-based approach, creating a Meeting situation model means training a statistical model on prerecorded and tagged scenarios. We then use this model to detect the state of the meeting at run-time. To select a proper statistical method, we have tested four different models on the classification task of the meeting state on artificial scenarios: Id3 and C4.5 decision trees, random forests, and Bayesian networks.

Fig. 4. Different meeting states of artificial scenario in 2D Room display of SitCom from left to right: no meeting, meeting/presentation, break, questions and answers.

Because of the lack of real data suited for our task, we used our SitCom tool [16] for the generation of artificial scenarios. We believe that these artificial scenarios closely mimic the interaction between people in real meeting environment. Figure 4 shows different meeting states of this artificial scenario in the 2D Room display of SitCom. The input to the classification task is a vector of observations (Table 2). PeopleCount type has three discrete values: empty, one or two, and more than 3 people. The TalkingActivity type has the following values: no speech activity, less than 1 minute of speech,

Suggest Documents