Artificial Intelligence in Mobile Systems 2004 (AIMS

Sonderforschungsbereich 378 Ressourcenadaptive kognitive Prozesse ¨ Informatik IV KI-Labor am Lehrstuhl fur Leitung: Prof. Dr. Dr. h.c. mult. W. Wahlster

REAL Universitat ¨ des Saarlandes FR 6.2 Informatik IV Postfach 151150 D-66041 Saarbrucken ¨ Germany Tel.: +49 - 681 / 302 - 2363

Memo Nr. 84

Artificial Intelligence in Mobile Systems 2004 (AIMS 2004) in Conjunction with the Sixth International Conference on Ubiquitous Computing 2004

¨ Baus, Christian Kray, Robert Porzel (Eds.) Jorg

September 2004

Sonderforschungsbereich 378

Ressourcenadaptive Kognitive Prozesse

ISSN 0944-7822

84

© 2004 Universität des Saarlandes Einbandgestaltung, Druck und Verarbeitung: Universität des Saarlandes Printed in Germany ISSN 0944-7822

Preface

AIMS 2004 is the fifth workshop in a series of AIMS workshops held at different conferences (ECAI, IJCAI, UbiComp). Due to the very positive response to last year's AIMS, this year's workshop is again held in conjunction with UbiComp in order to further explore the benefits of combining research from artificial intelligence and ubiquitous computing. AIMS 2004 brings together researchers working in various areas of (applied) AI as well as in mobile and ubiquitous computing systems. The workshop aims to explore recent research and findings in AI, the development of mobile systems and their seamless integration in ubiquitous computing environments. The main objective of the workshop is a lively discussion and exchange of ideas, and the quality of this year's submissions holds great promises in this respect. In total, we received 18 papers of which 13 were accepted (three of them as short presentations). We will have four sessions on navigation support, multi-modal interaction, ontologies and user modelling as well as on further topics related to AI in mobile systems. Looking forward to some very interesting presentations, we would like to thank the authors for their contributions and the reviewers for their feedback. Furthermore, we would like to extend our thanks to the workshop chair as well as the organisers of Ubicomp 2004 for hosting AIMS 2004.

Nottingham, September 7th 2004,

Jörg Baus, Christian Kray, and Robert Porzel (organising committee)

i

Program Committee

Thomas Barkowsky (Bremen University, Germany) Andreas Butz (Ludwig-Maximilians-University, Germany) Keith Cheverst (Lancaster University, UK) Eric Horvitz (Mircrosoft Research, USA) Antonio Krüger (Saarland University, Germany) Rainer Malaka (European Media Lab GmbH, Germany) Thomas Rist (University of Applied Sciences Augsburg, Germany) Albrecht Schmidt (Ludwig-Maximilians-University, Germany) Georg Schneider (University of Applied Sciences Trier, Germany) Massimo Zancanaro (ITC-IRST, Italy)

ii

Schedule und Table of Contents 9.00 - 9.15 9.15 - 9.30

Short

9.30 -10.00

Full

10.00 - 10.30

Full

10.30 - 11.00 11.00 - 11.30

Full

11.30 - 12.00

Full

12.00 - 12.30

Full

12.30 - 13.45 13.45 - 14.15

Full

14.15 - 14.45

Full

14.45 - 15.15

Full

15.15 - 15.30

Short

15.30 - 16.00 16.00 - 16.30

Full

16.30 - 17.00

Full

17.00 - 17.15

Short

17.15 - 18.00

Welcome Direction Concepts in Wayfinding Assistance Systems Klippel, A., Dewey, C., Knauff, M., Richter, K.-F., Montello, D.R., Freksa, C., and Loeliger, E.-A. Assistance for Spatio-Temporal Planning in Ubiquitous Computing Environments Based on Mental Models Seifert, I., Bertel, S., and Barkowsky, T. The Bum Bag Navigator (BNN): An Advanced Pedestrian Navigation System Aslan, I., and Krüger, A. Coffee Break Strategies for Self-Organization and Multimodal Output Coordination in Distributed Environments Elting, C., and Hellenschmidt, M. A context inference and multi-modal approach to mobile information access West, D., Apted, T., and Quigley, A. Multimodal Interactions with an Instrumented Shelf Wasinger, R., Schneider, M., Baus, J. and Krüger, A. Lunch Personal Ontologies for feature selection in Intelligent Environment Visualisations Carmichael, D.J., Kay, J., and Kummerfeld, B. OWL-based location ontology for context-aware services Flury, T., Privat, G., and Ramparany, F. Semantic User Profiles and their Applications in a Mobile Environment von Hessling, A., Kleemann, T., and Sinner, A. Supporting Personalized Interaction in Public Spaces Cozzolongo, G., De Carolis, B., and Pizzutilo, S. Coffee Break User Preferences Initialization and Integration in Critique-Based Mobile Recommender Systems Nguyen, Q.N., and Ricci, F. Template-based Adaptive Video Documentaries Rocchi, C., and Zancanaro, M. Privacy-enhanced Intelligent Automatic Form Filling for Context-aware Services on Mobile Devices Rukzio, E., Schmitd, A., and Hußmann, H. Closing Discussion

iii

1 9 15

20 28 36

44 52 59 64

71 79 84

Direction Concepts in Wayfinding Assistance Systems A. Klippel1, C. Dewey2, M. Knauff3, K.-F. Richter2, D. R. Montello4, C. Freksa2, E.-A. Loeliger2 1

Cooperative Research Centre for Spatial Information Department of Geomatics, University of Melbourne, Australia [email protected] 2

Collaborative Research Center Spatial Cognition Universität Bremen, Germany {dewey, richter, freksa, loeliger}@sfbtr8.uni-bremen.de 3

Max-Planck-Institute for Biological Cybernetics & Center for Cognitive Science University of Freiburg, Germany [email protected] 4

Department of Geography University of California, Santa Barbara, USA [email protected] Abstract We report new findings about the mental representation of direction concepts and how these findings may revise formal models of spatial reasoning and navigation assistance systems. Research on formal models of direction concepts has a long tradition in AI. While early models where designed for unstructured space, for example, reasoning about cardinal directions, research on the influence of context has questioned the universal applicability of these models; mental direction concepts in city street networks differ from those in sea or air navigation. We investigated direction concepts at intersections in city street networks by using methods from cognitive psychology for eliciting conceptual knowledge. The results are used to modify the direction concepts employed in our wayfinding assistance framework. Within this framework it is possible to use abstract conceptualizations and to externalize them in different formats, for example, verbal or pictorial. Hence, this research may influence both, verbal and pictorial route directions and, additionally, the transfer from one into the other.

tions to cardinal directions and to egocentric reference systems can be found in Frank (1992) or Hernandez (1994). Different levels of granularity were achieved, for instance, by bisecting sectors, i.e. 4 sector models were transformed to 8 sector models and so on. Besides using sectors to represent direction categories, some models use axes as well as sectors. The double cross calculus by Freksa (1992) (see also Freksa & Zimmermann, 1992), or the cardinal direction model by Ligozat (1998) show examples of their application. In models that use axes two options can be differentiated: axes that are true axes, i.e. they represent an equivalence class of their own, and axes that are prototypical instances of a sector and are taken as the representation of a sector. In the mentioned approaches by Freksa, Zimmermann, and Ligozat, the axes are 'true' axes. Other approaches, like the smart environment approach by Baus, Breihof, Butz, Lohse, and Krüger (2000) or the wayfinding choreme approach by Klippel (2003), use axes as prototypical instantiations of sectors to bridge, for example, the gap between underspecified expressions found in natural languages and the graphic representation of direction concepts. The early homogenous direction models have been extensively criticized by Montello and Frank (1996). They ran various simulations to explain data that Sadalla and Montello (1989) collected and found that direction models with differently sized sectors fit the behavioral data best. This example shows how important behavioral research is and how it can be employed to modify existing models. This research in cycles is not only valid for HumanCentered-Design (e.g., ISO 13407) but also for basic research as approached in this paper.

Keywords direction concepts, research cycle, wayfinding assistance system, conceptual modeling

Introduction Concepts of directions have a long tradition in AI research and various models have been proposed for different areas of application, for example, in qualitative spatial reasoning. Directions (orientations) are viewed as basic spatial relations (Habel, Herweg, and Pribbenow, 1995). Common to all models is that equivalence classes are used to represent a category of directions, following the general approach of AI to reduce and structure the information available. Early direction models partitioned space homogenously. Applica-

1

Direction Concepts in Route Directions The processing and representation of angular/direction information is essential for human spatial cognition and especially for wayfinding (e.g., Sholl, 1988; Montello et al., 1999; Waller et al., 2002). A growing number of experimental results indicate that route directions and wayfinding basically consist of making direction choices at decision points (e.g., Denis et al., 1999). Pursuing this line of thought, wayfinding can be characterized as following a route segment up to a decision point, making a directional choice, following the next route segment up to the next decision point, making a directional choice, and so on. Decision points can be operationalized as belonging to two main categories: decision points with a direction change (DP+) and decision points without a direction change (DP). The question arises, how do humans conceptualize directions at decision points, especially at DP+? What are prototypical direction (turning) concepts and what do their graphical externalizations look like? For most situations, qualitative information of direction—in the sense of a small number of equivalence classes—is sufficient. Especially in city street networks, which constrain the environment, directional choices of exact angular information are rarely necessary. Various studies show that angular information in city street networks—as well as in geographic space in general—is conceptualized and remembered qualitatively by humans (e.g., Byrne, 1979; Tversky, 1981; Moar & Bower, 1983). Verbal route directions reflect this qualitativeness: precise, i.e. very fine grained, direction information is rather an exception that is hardly ever given (e.g., Denis et al., 1999; Allen, 2000; Klippel & Montello, submitted). If we take the perspective of conceptual spatial primitives (e.g., Golledge, 1999; Klippel, 2003) the question arises how many different categories of directions are necessary, and how many categories humans employ? Additionally, we can pose the question whether there are prototypical turning concepts at all and how their graphic or verbal externalizations may look like. Evans (1980) reported three major strategies that occur in representing directional information in city street networks mentally. These aspects are: •

straightening curved paths,

•

squaring oblique intersections, and

•

aligning nonparallel streets.

Direction Concept Experiment To explore how many categories have to be assumed for directions in city street networks and what the relation between sectors and axes is in this domain, we used an experimental method from cognitive psychology. We chose the grouping task paradigm, which is traditionally one of the most important methods to investigate conceptual knowledge in psychology (e.g., Cooke, 1999). The main idea of such tasks is that conceptual knowledge plays the central role in rating the similarity of given stimuli: stimuli are assessed as similar if they are instances of the same concepts. They are assessed as dissimilar if they are instances of different concepts. If other aspects of presentation are controlled, like in our experiment, such grouping experiments can provide important insight into the internal structure of conceptual knowledge. To realize the experiment, we used an experimental tool that has been developed by Knauff, Rauh, and Renz (1997). The tool realizes a method that is comparable to card sorting but helps to generate the experimental materials, presents the stimuli, and collects the relevant data. In contrast to other card sorting/grouping tools (e.g., Harper et al., 2003), it is especially designed to use pictorial stimuli and is therefore well suited for spatial and map related research. Methods Participants Twenty-five students of the University of Bremen were paid for their participation (9 female, 16 male). Design and Procedure The experiments took place in a lab space at the University of Bremen. The grouping tool was adapted to the requirements of the present study. 108 icons where used to depict different possibilities to 'make a turn' at an intersection. We designed the icons according to the following criteria: using 1 degree increments would have resulted in 359 different icons (excluding the 'direct back'), which seemed infeasible. Instead, we set off from the results of Klippel (2003) and added bisection lines incrementally. In other words, starting with the prototypical direction concepts (in degrees: 45, 90, 135, 180, 225, 270, 315), we added four times bisecting lines for the resulting sectors. This resulted in increments of 5.625 degrees. We only used two branches of an intersection; the participants were advised to imagine the pictures as representations of possible turns at an intersection. The back sector (corresponding to angles between 315 and 45 degrees) was excluded for graphical reasons. The items were doubled to test whether the same items were placed in the same groups. The icons were integrated in the grouping tool. Figure 1 shows a screenshot of an ongoing experiment. The grouping tool divides the screen in two parts. On the left side, the stimulus material, i.e. all icons depicting possibilities to 'make a turn' at an intersection are placed in random order. The large number of icons requires scrolling for accessing all items. This is a common procedure in interact-

The second observation is strongly supported by recent approaches on cognitive adequate route directions (e.g., Tversky & Lee, 1998, 1999). But, especially for European style city street networks, this observation has to be researched in greater detail since these are often not regularly shaped.

2

ing with computer interfaces; no problems were expected nor found during the experiments. The right side of the screen is empty at start; here, groups of icons are created during the experiment. The actions the participants can perform were kept simple and the interface shows no unnecessary other features. Participants could perform the following actions: •

Create a new group: the grouping tool allows participants to create as many groups as they want and regard as suitable for the task at hand. For each group a new box is created on the right side of the screen. In case more items are placed in a group than fit the width of the original box, the box extends and scrolling is required to access all items in one group.

•

Delete a group: participants were allowed to delete groups. The grouping tool requires a group to be empty before it can be deleted.

•

Rearrange: The items on the left part of the screen can be newly arranged.

•

Done: indicates that the task is completed.

between groups” method, as it provides a low variance within groups. Additionally, we chose squared Euclidean distance to enhance the grouping procedure. The output of a cluster analysis is usually a dendrogram in which the grouping of the individual items is stepwise provided, i.e. for each new calculation step it is shown which items fall into the same group and which groups go together, respectively. However, instead of a dendrogram, we visualized the data as rays corresponding to the directions depicted by the icons used in the experiment. Each ray represents one icon; the rays are doubled since two identical icons exist. This way, it is possible to visualize the different steps in the grouping algorithm. Since in the end, each cluster analysis groups all items in one single group, we defined a finishing criterion: as soon as in two consecutive calculation steps no groups were combined, the clustering stops. We briefly discuss the individual steps as they highlight interesting aspects of mental direction concepts, too. Figure 2 illustrates the following discussion. The first level of clustering (Figure 2, part 1) does not show much more than that some directions start grouping together—indicated by the little geometric figures at the end of each ray—while others do not. On the second level (Figure 2, part 2), however, a clearer picture starts shaping: while most directions are placed in a group, three of them remain ungrouped—these are the rays that could be labeled 'straight', 'exact left', and 'exact right'. It is noteworthy that indeed all other icons (directions concepts) are already grouped. To some groups it is already possible to assign verbal labels while others may require more complicated expressions. The next levels of clustering show that there seems to be a relation between the persistence of a group and the simplicity (or complexity) of the potential verbal label. The back plane—from 270 over 0 to 90 degrees— seems to form less groups and seems to be clearer structured than the front plane. Yet, this result may be biased by leaving out the 'back' sector in the study design. In step (3), the front plane becomes more structured and the back plane starts to form one big group on the right side. From the three axes of step (2) only two remain: the 'straight' and the 'right' axis. In step (4) more groups of the front plane go together and step (5) is the last step in this analysis. The results of the clustering show 7 clearly distinguishable groups; one of them is an axis. The left and the right plane are symmetric, the front and the back plane are not. Interestingly, the front and the back plane are clearly separated by 90 degrees left and right turns. The direction sectors differ in their size. As mentioned before, cluster analysis is not necessarily designed to verify (or falsify) hypotheses but to find clusters. Additionally, not only the end result is of interest, but also the individual steps. Regarding earlier steps of the analysis, it seems to be the case that a sector is not necessarily prototypically represented by the bisecting lines of that sector (i.e., left and right).

Finally, the participants were asked to verbally label the groups that they had created.

Figure 1. The grouping tool (snapshot from an ongoing experiment). On the left side the icons representing turns at intersections are presented in random order. On the right side a participant has started to group icons according to her categories of turning actions. Results A hierarchical cluster analysis has been used to analyze the data. The procedure is an exploratory tool designed to reveal natural groupings within a data set. It identifies relatively homogeneous groups of items (cases), using an algorithm that starts with each case in a separate cluster and combines clusters until only one is left. There are different possibilities to compute the clusters. We used the “linkage

3

1

2

3

4

5

Figure 2. Results of the clustering analysis. Each ray represents one of the icons used in the grouping tool. Stars 1-5 show the first 5 steps of the clustering algorithm. The geometric figures at the end of each ray indicate groups of icons.

4

Discussion and Application to Mobile Systems Based on these findings we propose a revised model of direction concepts applicable to the generation of verbal route directions and the schematization of maps, especially in mobile systems and electronic route planners (see Figure 3). In general, the abstract conceptual characterization and the automatic generation of route directions require a formal model to decide when a change in direction is considered, for example, a 'left turn' or a 'veer left'. The proposed model is the basic model for direction concepts at intersections in city street networks. Changing situations and changing contexts—T-intersections or circles, transportation modalities or traveling speed—do require further investigation.

The first area of application is ongoing research in the SFB/TR 8 MapSpace Project and the CRC SI (e.g., Klippel et. al., preliminarily accepted; Richter, Klippel, and Freksa, 2004). The aim is to advise a conceptual specification language for route knowledge. The abstract representation format is chosen to allow for situation and context adapted provision of route information in different modalities and externalization formats, for example, verbal or graphical. To use this model in a wayfinding assistance system, the sectors and the axis can be assigned to natural language expressions, for example, turn right, turn left, go straight (see Figure 3; see Outlook for a discussion of work on a corpus of verbal externalizations of directions concepts). On this basic level, our results challenge the assumption of homogenous direction models and render the specification of conceptual structures underlying directions in city street networks more precise. Yet, for the assignment of proper natural language expressions a couple of open questions remain:

STRAIGHT VEER LEFT

LEFT

SHARP LEFT

VEER RIGHT

RIGHT

Figure 3. Original (upper part) and revised (lower part) direction model for the generation of verbal route directions and the schematization of maps. Black lines represent prototypical directions (e.g., Klippel, 2003). On the last level of the presented cluster analysis our model would comprise the following features (see Figure 3): It consists of 7 sectors with different sizes; more precisely, 6 differently sized sectors plus 1 axis (plus the not further examined 'back' sector). As mentioned above, the formal specification of direction concepts is necessary for the following three application areas: • the abstract conceptual characterization of route knowledge, the assignment of natural language expressions to turning concepts, and

•

the schematic presentation of route maps.

How can we explain the rather sharp demarcation of front and back plane?

•

What would be a proper label for the tor?

•

What are proper labels for the other sectors?

•

How can we account for results gained by Dale, Geldorf, and Prost (2003) and Klippel, Tappe, and Habel (2003) that show that aggregation/chunking is one of the key elements in the conceptualization of route elements, i.e. a term turn right at the post office or turn right at the third intersection is preferred over 'go straight, go straight, turn right' (see also Wahlster et al., 1998).

SHARP LEFT

sec-

Some answers can be provided based on methodological issues and need a more detailed analysis of the existing data, for example, an account for individual differences and the juxtaposition and discussion of the results of different clustering methods. Others are left for future work, like the further specification of the 'back' sector. Mobile wayfinding assistance systems not only provide verbal instructions but communicate in a map-like manner, too. How can the results of our study be used to advise graphic route directions? One of the greatest issues in route map design—Webmapping, mobile services, etc.—is the definition of suitable schematization algorithms (e.g., Agrawalla & Stolte, 2001) or, from a more theoretical perspective, the question of aspectualization (e.g., Freksa, 1999). The approach taken here is to set out from prototypical representations for basic actions in following a route and from communications of these actions, respectively. This idea originates in work by Tversky and Lee (1998, 1999) on toolboxes for verbal and graphical route directions that has already inspired the approach by Agrawalla and Stolte (e.g., 2001). While the latter abandoned the idea of primitives to come up with an excellent

SHARP RIGHT

•

•

5

technical solution, we stick to the idea of having prototypical graphical representations of (mental) conceptual primitives of route direction elements (e.g., Klippel and Richter, 2004). The prototypical direction concepts (black lines in Figure 3) needed for creating route maps or parts thereof originate in the model by Klippel (2003). Yet, although there seem to be prototypical direction concepts, not all represent sectors and they cannot easily be computed as bisecting lines. In the case of the concept LEFT and RIGHT they seem to demarcate the lower boundary of a sector. Likewise in other sectors they do not seem to be the bisecting lines. However, in this basic case left and right can be treated equally, which keeps the model computationally feasible. If we stick to the idea of prototypical graphical representational elements for (mental) conceptual elements or routes, the newly proposed model allows for the assignment of different turning angles to the corresponding prototypical graphical representations. In opposition to the natural language expressions we do not have the luxury of an inherently underspecified representation but are forced to decide for exactly one instantiation (see also Habel, 1998). Maps are bounded, often temporarily and spatially fixed media that make it necessary to commit to one of many different alternative representations at a given point in time. The prototypical elements are therefore taken as the representations of the sectors (the axis) found in the results of this study. The new model corresponds to various results in behavioral cognitive science (e.g., a higher differentiated front plane; for an overview cf. van der Zee and Slack, 2003). The rather unusual demarcation of the front and the back plane can additionally be explained by the visual characteristics of traveling through a city street network: One can look into the streets in the front plane (resulting in a greater differentiation) but not into the streets in the back plane. This poses an interesting question on the difference between overview (birds-eye) perspective and route (field) perspective (e.g., Herrmann et al., 1995) and can be exploited in the difference between maps and real world (or VR) interfaces (e.g., Klippel & Montello, submitted). The interesting theoretical question is where do our direction concepts come from? Are they persistent phenomena or do they change with respect to the interaction with different interfaces? Are direction concepts embodied (e.g., Wilson, 2002), i.e. is our body with its physical characteristics the driving force behind direction concepts, or do we have to assume other factors? Plenty of research relates our body axis to the direction concepts we developed (e.g., Bryant, 1992). To which degree these results are applicable to city street networks is an open question. Additionally, for the embodiment explanation there are two alternative perspectives: the first is that the environment as such is responsible for shaping our concepts. This could imply that

North American city dwellers have different concepts than Europeans who are exposed to irregular street grid patterns to a greater extend (e.g., Davies & Pederson, 2001). Second, natural language could have—to a certain degree—an influence on the direction concepts. The latter topic is discussed under the term 'linguistic relativity' (e.g., Gumperz & Levinson, 1996). Proof for the influence of language can also been found in our experiment as some participants reported that if they had known in advance that the groups had to be labeled they would have created different groups. This question is under ongoing research in our lab and it might lead to modifications in the conceptual model assumed for verbal in opposition to graphic route directions. The distinction by Klippel (2003) between standard directions concepts (LEFT, RIGHT and STRAIGHT) and modified directions (e.g. VEER LEFT)—as a concept, not literally as a verbal expression—has or can be extended to supermodified concepts like VEER SLIGHTLY LEFT. This becomes obvious from the discussion of the individual steps in the cluster analysis as well as from language analyses (e.g., Klippel & Montello, submitted). These concepts may not be as dominant as the standard and modified turning concepts but are present until the third and forth step in the cluster analysis, respectively (see Figure 2). One remaining problem is the flexibility of clustering methods. This flexibility makes it a very valuable tool for exploratory data analyses as different clustering methods may be used to reveal the structure in the data. As such it was used in the present work. A word of caution, however, is that the method provides several degrees of freedom so that it is not suitable for more detailed statistical analysis or even for a test of specific hypotheses. Having said that, we still believe that the present findings help to develop more specific hypothesis and to gain more insight into the mental representation of direction concepts. Conclusions and Outlook The work undertaken here is part of a greater research effort on specifying the (mental) conceptual structure underlying direction concepts in route directions. As the change in direction at decision points is the most pertinent information in wayfinding and route directions, the focus on this knowledge should provide: first, valuable insight in basic research questions on cognitive processes underlying wayfinding and route directions; second, a challenge on existing assumptions on the formal specification of direction concepts; and third, an alternative to existing directions models. The results presented here are exploratory in nature but nonetheless allow for the modification of existing formal direction models. Based on this initial research we are undertaking further research efforts on three basic research questions: the (mental) conceptual structure of directions concepts, the verbal externalization of mental directions concepts, and the applications of the findings to (graphic) schematization principles.

6

Acknowledgements This research is carried out as a part of the SFB/TR 8 Spatial Cognition. Funding by the Deutsche Forschungsgemeinschaft (DFG) is gratefully acknowledged. We would like to thank Thilo Weigel for his work on the first grouping tool. We benefited greatly from discussions with Heike Tappe and the participants of the SCRAM research meeting in Santa Barbara.

Freksa, C. (1992). Using orientation information for qualitative spatial reasoning. In A.U. Frank, I. Campari, and U. Formentini (Eds.), Theories and methods of spatiotemporal reasoning in geographic space (pp. 162-178). Berlin: Springer. Freksa, C. & Zimmermann, K. (1992). On the utilization of spatial structures for cognitively plausible and efficient reasoning. In Proceedings SMC92 1992 IEEE International Conference Systems Man and Cybernetics (pp. 261-266). Chicago. Reprinted in F.D. Anger, H.W. Güsgen, J. v. Benthem (Eds.), Proceedings of the Workshop on Spatial and temporal reasoning, IJCAI93, (pp. 61-66), Chambery 1993.

References Agrawala, M. & Stolte, C. (2001). Rendering effective route maps: Improving usability through generalization. In E. Fiume (Ed.), Siggraph 2001. Proceedings of the 28th Annual Conference on Computer Graphics, Los Angeles, California, USA (pp. 241-250). ACM Press.

Golledge, R.G. (1995). Primitives of spatial knowledge. In T.L. Nyerges, D.M. Mark, R. Laurini, and M.J. Egenhofer (Eds.), Cognitive aspects of human - computer interaction for geographic information systems (pp. 29-44). Dordrecht: Kluwer Academic Publishers.

Allen, G.L. (2000). Principles and practices for communicating route knowledge. Applied Cognitive Psychology, 14, 333-359. Baus, J., Butz, A., Krüger, A., Lohse, M., & Breihof, C. (2000). Some Aspects of Scouting Smart Environments. Proceedings of the AAAI Spring Symposium on "Smart Graphics", March 20th-22nd 2000, Stanford, CA, USA.

Gumperz, J. J., & Levinson, S. C. (1996). Rethinking Linguistic Relativity. Cambridge, UK: Cambridge University Press. Habel, C. (1998). Piktorielle Repräsentation als unterbestimmte räumliche Modelle. Kognitionswissenschaft, 7, 5867.

Bryant, D. J. (1992). A spatial representation systems in humans. Psycoloquy, 3(16), Space (1). Byrne, R.W. (1979). Memory for urban geography. Quarterly Journal of Experimental Psychology, 31, 147-154. Cooke, N. J. (1999). Knowledge elicitation. In F. T. Durso (Ed.), Applied Cognition. Chichester, UK: Wiley.

Habel, C., Herweg, M., and Pribbenow, S. (1995). Wissen über Raum und Zeit. In G. Görtz (Ed.). Einführung in die künstliche Intelligenz (2nd ed.) (pp. 129-185). Bonn: Addison-Wesley.

Dale, R., Geldof, S., & Prost, J.-P. (2003). CORAL: Using natural language generation for navigational assistance. Proceedings of the 26th Australasian Computer Science Conference (ACSC2003), Adelaide, Australia.

Harper, M. E., Jentsch, F. G., Berry, D., Lau, H. D., Bowers, C., & Salas, E. (2003). TPL-KATS-card sort: A tool for assessing structural knowledge. Behavior Research Methods, Instruments, & Computers, 35(4), 577-584.

Davies, C., & Pederson, E. (2001). Grid patterns and cultural expectations in urban wayfinding. In D. R. Montello (Ed.), Spatial Information Theory. Foundations of Geographic Information Science. International Conference, COSIT 2001, Morro Bay, CA, USA. (pp. 400-414). Berlin: Springer.

Herrmann, T., Buhl, H.M., and Schweizer, K. (1995). Zur blickpunktbezogenen Wissensrepräsentation: der Richtungseffekt. Zeitschrift für Psychologie, 203, 1-23. Hernández, D. (1994). Qualitative representation of spatial knowledge. Springer: Berlin. ISO 13407: Human-centered design processes for interactive systems.

Denis, M., Pazzaglia, F., Cornoldi, C., and Bertolo, L. (1999). Spatial discourse and navigation: An analysis of route directions in the city of Venice. Applied Cognitive Psychology, 13, 145–174.

Klippel, A. (2003). Wayfinding Choremes. Conceptualizing Wayfinding and Route Direction Elements. Bremen: Universität Bremen.

Evans, G.W. (1980). Environmental cognition. Psychological Bulletin, 88, 259-287.

Klippel, A., & Montello, D. R. (submitted). On the Robustness of Mental Conceptualizations or the Scrutiny of Direction Concepts. (Extended abstract, GIScience 2004).

Frank, A.U. (1992). Qualitative spatial reasoning about distances and direction in geographic space. Journal of Visual Languages and Computing, 3, 343-371.

Klippel, A., & Richter, K.-F. (2004). Chorematic Focus Maps. In G. Gartner (Ed.), Location Based Services & Telecartography. Proceedings of the Symposium 2004. (pp. 39-45). Wien, Austria.

Freksa, C. (1999). Spatial aspects of task-specific wayfinding maps. In J. S. Gero & B. Tversky (Eds.), Visual and Spatial Reasoning in Design (pp. 15-32). Key Centre of Design Computing and Cognition, University of Sydney.

Klippel, A., Tappe, T., & Habel, C. (2003). Pictorial Representations of Routes: Chunking Route Segments during Comprehension. In C. Freksa, W. Brauer, C. Habel & K. F.

7

Wender (Eds.), Spatial Cognition III. Routes and Navigation, Human Memory and Learning, Spatial Representation and Spatial Learning (pp. 11-33). Berlin: Springer.

Sadalla, E.K. & Montello, D.R. (1989). Remembering changes in direction. Environment and Behavior, 21(3), 346-363.

Klippel, A., Tappe, T., Kulik, L., & Lee, P. U. (preliminarily accepted). Wayfinding Choremes - A Language for Modeling Conceptual Route Knowledge. Journal of Visual Languages and Computing.

Sholl, M.J. (1988). The relation between sense of direction and mental geographic updating. Intelligence, 12, 299-314.

Knauff, M., Rauh, R., and Renz, J. (1997). A cognitive assessment of topological spatial relations: Results from an empirical investigation. In S.C. Hirtle & A.U. Frank (Eds.), Spatial information theory: A theoretical basis for GIS (pp. 193-206). Berlin: Springer.

Tversky, B. & Lee, P. (1998). How space structures language. In C. Freksa, C. Habel, and K.F. Wender (Eds.), Spatial Cognition. An interdisciplinary approach to representing and processing spatial knowledge (pp. 157-175). Berlin: Springer.

Ligozat, G. (1998). Reasoning about cardinal directions. Journal of Visual Languages and Computing, 9, 23–44.

Tversky, B. & Lee, P. (1999). Pictorial and verbal tools for conveying routes. In C. Freksa & D.M. Mark (Eds.), Spatial information theory. Cognitive and computational foundations of geographic information science (51-64). Berlin: Springer.

Tversky, B. (1981). Distortions in memory for maps. Cognitive Psychology, 13(3), 407-433.

Moar, I. & Bower, G.H. (1983). Inconsistency in spatial knowledge. Memory and Cognition, 11(2), 107-113. Montello, D.R. & Frank, A.U. (1996). Modeling directional knowledge and reasoning in environmental space: Testing qualitative metrics. In J. Portugali (Ed.), The construction of cognitive maps (pp. 321-344). Dordrecht: Kluwer Academic Publishers.

Van der Zee, E., & Slack, J. (2003). Representing Directions in Language and Space. Oxford: Oxford University Press. Wahlster, W., Blocher, A., Baus, J., Stopp, E., & Speiser, H. (1998). Resourcenadaptierende Objectlokalisation: Sprachliche Raumbeschreibung unter Zeitdruck. Kognitionswissenschaft (Sonderheft zum Sonderforschungsbereich 378).

Montello, D.R., Richardson, A.E., Hegarty, M., and Provenzy, M. (1999). A comparison of methods for estimating directions in egocentric space. Perception, 28, 9811000. Richter, K.-F., Klippel, A., & Freksa, C. (2004). Shortest, Fastest, - but what Next? A Different Approach to Route Directions. In Geoinformation und Mobilität - von der Forschung zur praktischen Anwendung. Beiträge zu den Münsteraner GI-Tagen 2004. (pp. 205–217). Münster: IfGIprints. Institut für Geoinformatik.

Waller, D., Montello, D. R., Richardson, A. E., & Hegarty, M. (2002). Orientation specificity and spatial updating of memories for layouts. Journal of Experimental Psychology: Learning, Memory, & Cognition, 28, 1051 - 1063. Wilson, M. (2002). Six view on embodied cognition. Psychonomic Bulletin and Review, 9, 625-636.

8

Assistance for Spatio-Temporal Planning in Ubiquitous Computing Environments Based on Mental Models Inessa Seifert

Sven Bertel

Thomas Barkowsky

SFB/TR 8 Spatial Cognition, Universität Bremen, Germany [email protected]



Abstract This paper addresses a spatio-temporal configuration problem that consists of integrating a set of interdependent constraints. The problem’s scenario is set to a day at a trade fair during which meetings need be dynamically scheduled and assigned respective spatial locations on a map. For this type of configuration problem, mental problem solving is model-based, i.e. the problem is mentally solved by instantiation of constraints; where multiple instantiations are possible, typically only few get constructed. As a result, the performance of a corresponding planning assistance system does not only depend on its use of computational resources but also on the user’s cognitive effort required to understand the current state of the system and to guide the planning process. Corollary, cognitive processing models have to be integrated into the assistance system to allow for better predicting current cognitive efforts and reasoning preferences. We analyze the scenario with respect to modelbased problem solving strategies and propose first ideas towards an assistance system that presents itself through different media in a ubiquitous computing environment.

collaborate in the problem solving process. In doing so they constitute a interactive reasoning system. Reasons why constraints may not be automatically treated include that they relate to a human’s implicit preferences or knowledge, to emotions, to issues that involve esthetics, or, simply, that they are hard to verbalize (cf. [3]). Clearly, the performance of an interactive human-computer reasoning system does not only depend on the use of computational resources in the computational part but also on that of human cognitive resources. Take as an example the user’s cognitive effort that is required to understand the current state of the assistance system, or the effort required to guide the planning process (e.g. by selecting partial solutions that were generated computationally). Human intelligence is one of the bottlenecks in the process; in order to get good collaboration and to allow for better predicting current cognitive stresses in the human reasoner cognitive processing models are required within the computational part. In the following section, we will further consider the trade fair scenario. Afterwards, we will turn to mental modelbased problem solving in Section 3. Consequently, requirements will be assessed for assisting mental modelbased reasoning. In Section 5, we will present the outline of an assistance system that operates through different media in a ubiquitous computing environment.

Keywords Cognitive assistance, spatio-temporal reasoning, mental models, ubiquitous computing. 1 INTRODUCTION The problem type addressed in this paper is the integration of a set of interdependent spatial and temporal constraints that make up a configuration problem. Typically, the problem is dynamic as either the set of relevant constraints may vary during the problem solving process, or variation may exist in the set of values that are considered legal assignments for a specific variable. The constraints are usually interdependent to some degree, such that the selection of an assignment to one variable may restrict the set of legal assignments for another. The scenario of a sample configuration problem of this type, namely the dynamic planning and scheduling that takes place during a day at a trade fair, will be described in detail in the following section. In many cases, not all constraints of a problem are amenable to formalization, resulting in partially unformalized constraint problems (cf. [20]). With problems of this class, a number of decisions cannot be outsourced to an automatic constraint solver for various causes, and the human problem solver and a computational assistance system must

2 SCENARIO: A DAY AT A TRADE FAIR Imagine that you are attending a trade fair where you intend to visit a number of exhibition booths and where you want to talk to a number of people. You have compiled a list of events in which you want to participate, as well as a checklist of meetings that you would like to arrange. Now, you are in the middle of the trade fair, you hold a leaflet with the spatial overview of the trade fair area in your hand, and you are about to decide how to manage your agenda to get everything done within the time available. So, maybe you are about to decide between two options: one is to go to the northern part of the fair and watch the new technologies at a specific manufacturer’s exhibition that you planned to visit. It would take about 10 to 15 minutes to get there, and probably you could get a cup of coffee and a piece of cake there. The second option is to prepare to go to a scheduled appointment with a customer in the west side of the fair. This appointment will start in one

9

hour from now, and it will take 20 minutes to get to the customer’s booth. Actually, you will have plenty of time if you choose option two. Suddenly, your mobile phone rings. A colleague of yours informs you of a very interesting booth in the eastern part of the fair. You know that it would take about 10 minutes to get to the booth advised by your colleague and maybe 30 minutes to take a look at the exhibition and to chat with your colleague about the things presented. On the other hand, it is almost lunch time and a snack would be nice. So how do you decide right now and what are the consequences for the other events that are on your list? If you were able to follow the description up to this point, you have good working memory capacity. But don’t you think a visualization of the situation not only in your mind, but also in your hand is a good support in such a decision process? So far, the aspects mentioned have been more or less static. That means that new events may be added to the list of things to be dealt with, but the locations at which events will take place are fixed. The entire problem becomes more complicated if you think of making an appointment with a person, who is also moving through the trade fair. In this case, the place where you may meet the other person is not crucial, but the time and place of the appointment must fulfill certain constraints which are given by the fact that the respective partners are moving around. Clearly, the meeting must take place at a point where the two persons’ routes through the spatiotemporal environment cross each other. To achieve this goal, the information exchange about the persons’ tentative locations during the day at the trade fair becomes a part of our problem. We have to deal with the following questions: • What is the informational structure of the type of problems described above that must be handled by the assistance system? • What kinds of mental processes are involved in solving spatio-temporal configuration tasks of the type outlined above? • What kinds of cognitive restrictions have to be dealt with? • How can restrictions of human working memory be overcome by a cognitive assistance system? • Which technical devices can be used for the interaction with the assistance system? The demonstrated trade fair environment has a well defined spatio-temporal structure together with a set of corresponding constraints. However, not only the spatial and temporal constraints contribute to the complexity of the mental spatio-temporal configuration task; moreover, there are also the following categories of variables that influence the mental task to be performed: People. Often there exist preferences for meeting people (i.e. a partial order), specifying whom you want to meet

first (or in any case), and who can be met at a later point in time. Time and personal preferences. For example, you may want to arrange an important appointment in the morning or at the end of your day at the trade fair. Place. The place of an appointment may also be chosen according to temporal and spatial constraints. For example, if you cannot find the time to meet a particular person during the fair you may want to propose to convene in the evening for dinner. The mental representations constructed by the trade fair visitor include instances of variables that have to be mentally rearranged according to the given situation. Obviously, there may exist more than one spatio-temporal configuration that satisfies a given set of constraints. The following section describes mental decision making processes in more detail and helps identify the requirements of the assistance tasks. 3 SOLVING THE PROBLEM MENTALLY We will now have a look at the peculiarities associated with solving the presented type of problem mentally: where the problem’s complexity is sufficiently high, mental problem solving can be expected to rely on the construction of mental models and on reasoning with them (cf. [11]). Factors that influence problem complexity are, for instance, the number of relevant constraints and the degrees of variable interdependency. Mental models allude to the term model such as used in formal model theory: they provide a semantics and instantiates a set of constraints. It is important to note that even where the construction of many models is theoretically possible, only few get constructed in mental problem solving. Usually, the mental mechanisms involved are thus quite different from the mechanisms employed by standard constraint solvers, and planning assistance systems have to take these differences into account. With respect to our trade fair scenario, we identify various types of knowledge that have been investigated in spatial cognition: to begin with, the list of appointments that need to be scheduled represents a (largely 1-dimensional) ordering on information, which is typically processed based on mental models (e.g. [12]). The trade fair plan conveys a (largely 2-dimensional) spatial overview of the general area to be visited; its mental correspondences have been described by metaphors such as cognitive maps, cognitive collages, atlases, or spatial mental models, depending on the field of research and the focus taken (e.g. [22], [10] ). In the scenario, both types of information must be integrated with respect to the amount of time that our protagonist has available, given how long individual activities such as meetings take, and given the time required to get from one location to another. The temporal structure of the problem can be described as an ordering structure on time intervals related to events (cf. [1]).

10

For our fair trade scenario, the use of mental models implies that the visitor will not search the entire problem space for all possible solutions to his scheduling needs and then determine which one of these to put into action; rather, his mental problem solving resembles a depth-first search with some (maybe non-systematic) backtracking and a preference to accept one of the first models encountered that fulfill the constraints to a tolerable degree. Determining the order in which constraints are instantiated in the model is therefore crucial to the problem solving as it determines the subspace of the entire problem space that is searched. The actual order can be due to specific preferences, to the reasoner’s experience with solving problems of the type under consideration, as well as to the structure of the current problem. It is these factors that determine which of all possible models will be constructed. However, not all constraints need necessarily be integrated into a single model. Rather, the problem may be split into partial problems whose solutions are integrated later on, depending on the high-level strategies chosen (cf. the human problem solving approach taken in [18], and the range of variations proposed since, e.g. in [2]). Since the most constraining structure in the problem is a spatial one (the fair trade area to be navigated in), and since the integration of all knowledge aspects (space, time, ordering of events) is sufficiently complex, it can be assumed that the mental representation will be in the form of mental images [5]). Mental images are a specific form of mental models that integrate spatial knowledge into a spatioanalogical representation format. This format exhibits a number of visual properties; in fact, the mental representation structure for mental images, the visual buffer, is based on neural structures that are also used in real vision processes (i.e., visual object or scene recognition, cf. [14]). Whatever the actual high-level problem solving strategy may be, human mental reasoning is restricted in the amount of available attention and storage capacities. The fact that mental reasoning is based on the construction of specific models rather than of all models can be interpreted as an adaptation to these restrictions. Another principle towards an economic use of resources is outsourcing: mental representations are off-loaded to the environment and analogies between mental and external representation structures are exploited to assist mental reasoning processes (cf. [24]). Writing a shopping list is a good example of creating an external representation that is meant to assist internal representations, making sketches and scribbles while mental reasoning is yet another. 4 ANALYSIS OF ASSISTANCE TASKS We will now identify a number of basic requirements for assisting the mental solving of our trade fair problem. The first one addresses capacities of mental processing and draws on the observation that, typically, visual mental images as a specific, integrated type of mental representation are constructed whenever a sufficiently large number of

knowledge fragments are involved. Mental images instantiate a set of constraints in a quasi-pictorial manner. Along with an increase in specificity that necessarily goes along with the instantiation process (i.e. through graphical constraining, [19]), it is the specific representational format that allows to read off novel bits of information. In this property, mental images exhibit many similarities to external diagrams: they offer free rides in reasoning [21], they group information that is needed at the same time, and they establish correspondences between knowledge fragments without the need to introduce labels [17]. Much has been written on the specific role that external diagrams, sketches, pictures, and graphs play for carrying out a variety of cognitive tasks: they may, for example, reduce the number of choices that a reasoner has to take while reasoning or they may help him employ the spatial dynamics in the environment (external computation, cf. [13]). Mental images and external diagrams contribute to a dialectics between a reasoner's inner processes and the external world [7],[8], in which mental constructions are externalized, internalized again, externalized, and so on. Generally, this close relation between mental images and external diagrams is attributed to a close coupling of imagery and visual perception (e.g. [6]; [15]). Limitations in storage and attentional capacities are inherent to mental reasoning processes. External representations do not have comparable storage restrictions; this is why we often take notes during mental reasoning. Where mental problem solving involves the construction of mental images, external diagrams possess specific representational qualities and should be employed as for the assistive setting. Additionally, in cooperative assistance, external representations play a triple role, as they may be used as external notes for mental reasoning, or serve to convey information from the human reasoner to the assistance system and vice versa. The second requirement which we can identify for assisting a mental solving of the trade fair problem is derived from the observation that mental problem solving is modelbased, and that even where the construction of many models is theoretically possible, only few get constructed. The selection of those few is not arbitrary, rather there exist preferences in mental model construction [12]. To provide an adequate context, the assistance system needs to keep track of the (set of) model(s) currently constructed by the user. Here, interaction paradigms are needed of how actions are reflective of changes in mental states. With respect to the assistance provided, two general modes are conceivable: in-line assistance, where the models proposed by the computational system are similar to those mentally constructed by the human reasoner, and complementary assistance, where the system provides a number of alternative models. Figure 1 demonstrates two different alternatives of the trade fair visit as displayed on a PDA. The scheduled ap-

11

pointments take place in halls H6, H10, H13 and H1. Visits to the booths in halls H5 and H7 are not fixed in time. The diagram on the display is a visualization of a mental image modeled by the assistance system. Therefore, only the halls which are important for the decision process are shown. The appointments are annotated to halls; routes are schematically denoted by map gestures (represented by arrows). The corresponding details to the appointments are listed under the diagram. The assistance system highlights the differences (e.g. H5 and H7) between the two alternative solutions. By clicking on the diagram area the assistance system provides additional information according to the spatial context of the selection (e.g. top display: café near H6, bottom display: restaurant near H10). Cognitively, the trade fair problem is especially demanding as the spatial and temporal constraints involved are not stable over time. That is, the set of relevant constraints may vary during the problem solving process, or the variation may be in the set of values that constitutes the legal assignments for a specific variable. It is important that the system mediates the dynamics to the user so that only changes in currently relevant parts of the problem are communicated (see marked differences (highlighted color) between the alternatives H5 and H7 in Fig. 1). A second form of dynamics lies in how the human problem solver approaches the decision space of the trade fair problem: Typically, not all possible problem states will be considered and, commonly, at a given moment and a given place, not all possible candidates for the next decision will be taken into account. The selection of the actual subspace of the decision space that will be considered for the planning process is highly context-dependent; as a result, a varying extent of the subspace that gets considered introduces further dynamics (cf. [2]). As a result, an assistance system needs to carefully select the information that it passes on to the human planner, and it needs to tune it depending on the user’s actual location in time and space. The planning context needs to be maintained while the user moves in space and while time goes on. On the other hand, the assistance needs to keep track of model changes (i.e., when the human planner decides to momentarily consider appointments later in the day, or when he rearranges the current schedule). The user’s current spatio-temporal context provides some indication as to which parts of the decision space form part of the actual mental model, and which do probably not. It is this requirement to carry on with planning and interaction tasks while changing spatial and temporal locations that requires a ubiquitous computing infrastructure. 5 OUTLINE OF A MODEL-BASED ASSISTANCE SYSTEM As described above, the mental model-based assistance system supports our user in highly dynamic spatiotemporal situations. However, we have to define media that

support the user in the spatio-temporal environment. The following requirements should be satisfied: User mobility. The assistance system should provide user mobility [3]. Capability to visualize mental images. During the assistance tasks the user’s mental model (i.e. in-line assistance) and related alternatives (i.e. complementary assistance) should be visualized. Awareness of changes. The user should be informed of changes in the appointment schedule and be able to perform changes himself. H1

09:30

5 min

16:30 20 min

H6

H5

H7

14:30

11:30

30 min

15 min H 13

H 10

30 min

scheduled appointments (21.03.2004): 09:30 Microsoft 11:30 DFKI 14:30 Sony

H1 16:30

10 min

09:30

H6

H5

10 min H7 14:30 H 13

11:30

30 min

H 10

10 min 30 min

scheduled appointments (21.03.2004): 09:30 Microsoft 11:30 DF KI 14:30 Sony

Figure 1. External representation of two alternati ve mental images .

Nowadays, there exist a great variety of sophisticated communication and desktop devices. From the described requirements it follows that on the one hand we need a mobile communication device and on the other a device that is capable of visualization and interaction with mental images (i.e. externalization and internalization of diagrams). A device that satisfies all of the requirements is a PDA. However, PDAs only provide a poor ability to interact with dia-

12

grams. The best solution for interactions with mental images are devices like smart boards. In the following, we will discuss how the combination of the two device types operates in a ubiquitous computing environment. Subsequently, different communication scenarios (e.g. multi-user multi-device communication) will be discussed. 5.1 Multi-User and Multi-Device Communication The ubiquitous computing domain proposed here employs wearable PDAs and stationary smart boards. Both types of devices are network enabled. The protocols responsible for the secure transfer of appointments and preferences are integral parts of the assistance system. The smart boards in the proposed setting are used by many anonymous users. Fundamental issues of designing multi-user multi-device interfaces were discussed in [16]. Among others, management, technical, and social issues and the level of trust between the users must be taken into account to avoid confusing situations. Different approaches with respect to secure data access are discussed in [9]. It is assumed that the users can transfer their personal schedules from the PDAs to the smart boards and vice versa. For the sake of giving a simple communication scenario within this paper, the examples are reduced to two communication partners. Nevertheless, also scenarios with more than two users are needed for practical application. According to the technical prerequisites described above the following communication scenarios can be considered: Two mobile users with PDAs. To arrange a meeting, users have to exchange their preferences for a possible appointment. Because of the limited input capabilities of the smallsized devices information should be communicated as precisely as possible to overcome difficulties related to data input. User with PDA and user at smart board. A user standing at a smart board has an opportunity to view large-scale alternatives of possible solutions. Nevertheless, after selection of an appropriate time and place, the appointments communicated to a PDA user have to be precise to reduce the complexity of the interaction with the small device. Two users at one smart board. The interaction between two users sharing the same smart board is probably the most convenient form of interaction with respect to human to human communication. The huge display of a smart board can be divided into two separate views visualizing two different mental models. Two users at two different smart boards. Unlike users at the same smart board, users at two different smart boards have the whole device for their own purpose. Alternative solutions can be easily visualized and communicated to the other communication partner in a larger scale than to a PDA.

After describing how and what kind of information mobile users exchange for the arrangement of appointments, the question arises how the positioning information (e.g. current user position or an appointment location) should be acquired and communicated. The proposed assistance system should provide location-based events, for instance availability of a smart board in the vicinity of our user. Therefore, localization information is needed. Localization information. Continuous tracking of a user’s position is not necessary, but rather her position has to be detected on demand in dedicated situations. Mobile users have to be able to send information about the place where they want to meet to their communication partners; e.g. they also may want to localize themselves when they got lost. The positioning information can be provided by wireless networks capturing the positions of mobile devices (cf. [23]). The assistance system provides an interactive map containing landmarks, e.g. booth numbers or names of snack bars, which can be chosen by clicking. Mobile users can identify their current position according to these landmarks themselves and send landmark positions to identify the place of initiated appointments. This paper has outlined a number of ideas towards creating a mental model-based assistance tool for spatio-temporal planning in ubiquitous computing environments. However, many open issues remain: •

How can mental images be adequately modeled computationally when considering their quasi-pictorial nature?

•

How can the representations of alternative models be adapted to a particular situation context?

•

What are the advantages of using mental model-based approaches in this scenario compared to other approaches (such as statistical ones, or simple shortestpath algorithms)?

•

Can alternatives for spatio-temporal problem solutions be generated individually according to the actual spatial knowledge of the user, e.g. for familiar / unfamiliar environments?

6 CONCLUSION In the scope of the paper we have provided a concept of a mental model-based assistance system, which assists at solving dynamic spatio-temporal tasks. The proposed assistance system employs ubiquitous computing technologies in combination with mental model-based approaches to provide assistance for tasks that are beyond simple mental tractability. Mobile PDAs and stationary smart boards are employed as external media as they fit the requirements on user mobility and seamless interaction with mental images. From the described multi-user and multi-device scenarios it follows that device interaction capabilities are crucial for the amount of information to be exchanged. Finally, being

13

aware of location-based events (e.g. vicinity and availability of stationary devices) contributes to the ubiquitous spatio-temporal assistance. Eventually, the proof of the pudding is in the eating. It is through implementations of computational cognitive models for concrete assistance tasks, such as within the trade fair scenario sketched here, that they can prove that they have in fact the potential to improve human-computer communication, and to enable effective collaborative human-computer reasoning. ACKNOWLEDGMENTS We thank the two anonymous reviews for their constructive comments. We gratefully acknowledge financial support by the Deutsche Forschungsgemeinschaft (DFG) through the Transregional Collaborative Research Center SFB/TR 8 Spatial Cognition: Reasoning, Action, Interaction (project R1-[ImageSpace]). REFERENCES [1] Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 26 (11), 832-843. [2] Bertel, S., Freksa, C., & Vrachliotis, G. (2004). Aspectualize and conquer in architectural design. In J. S. Gero, T. Knight, & B. Tversky (Eds.), Visual and Spatial Reasoning in Design III (pp. 255-279). Key Centre of Design Computing and Cognition, U of Sydney. [3] Coyne, R., Rosenman, M., Radford, A., Balachandran, M., & Gero, J. (1990). Knowledge-based design systems. Reading MA: Addison-Wesley. [4] ETSI (1995) Technical Committee Reference Technical Report TCR-TR 007, Universal Personal Telecommunication (UPT); UPT Vocabulary. [5] Finke, R. (1989). Principles of mental imagery. Cambridge, MA: MIT-Press. [6] Finke, R. (1990). Creative imagery: Discoveries and inventions in visualization. Hillsdale: Lawrence Erlbaum. [7] Goldschmidt, G. (1991). The dialectics of sketching. Design Studies, 4: 123-143. [8] Goldschmidt, G. (1995). The designer as a team of one. Design Studies, 16: 189-209. [9] Hangartner U., & Steenkiste P. (2003). Access Control to Information in Pervasive Computing Environments. To appear in Proc. of 9th Workshop on Hot Topics in Operating Systems (HotOS IX), Lihue, HI, May 2003. [10] Hirtle, S. C. (1998). The cognitive atlas: using GIS as a metaphor for memory. In M. Egenhofer & R. Golledge (Eds.), Spatial and temporal reasoning in geographic information systems (pp. 267-276). Oxford University Press.

[11] Johnson-Laird, P. N. (1983). Mental models. Cambridge, MA: Harvard University Press. [12] Knauff, M., Rauh, R., & Schlieder, C. (1995). Preferred mental models in qualitative spatial reasoning: A cognitive assessment of Allen's calculus. Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 200-205). Mahwah, NJ: Lawrence Erlbaum. [13] Kirsh, D. (1995). The intelligent use of space. Artificial Intelligence, 73: 31-68. [14] Kosslyn, S. M. (1994). Image and brain - The resolution of the imagery debate. Cambridge, MA: MIT Press. [15] Kosslyn, S. M., & Thompson, W. M. (2003). When is early visual cortex activated during visual mental imagery? Psychological Bulletin, 129(5): 723-746. [16] Kray C., Wasinger R., & Kortuem G. (2004). Concepts and issues in interfaces for multiple users and multiple devices. Proc. of the workshop on Multi-User and Ubiquitous User Interfaces (MU3I) at IUI 2004 (pp. 711). Funchal, Madeira, Portugal. [17] Larkin, J. H., & Simon, H. A. (1987). Why a diagram is (sometimes) worth ten thousand words. Cognitive Science, 11: 65-99. [18] Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice Hall. [19] Scaife, M., & Rogers, Y. (1996). External cognition: how do graphical representations work? Int. J. HumanComputer Studies, 45: 185-213. [20] Schlieder, C., & Hagen, C. (2000). Interactive layout generation with a diagrammatic constraint language. In C. Freksa, C. Habel, & K. F. Wender (Eds.), Spatial cognition II - Integrating abstract theories, empirical studies, formal methods, and practical applications (pp. 198-211). Berlin: Springer. [21] Shimojima, A. (1996). Operational constraints in diagrammatic reasoning. In Allwein & Barwise (Eds.), Logical reasoning with diagrams (pp. 27-48). Oxford University Press. [22] Tversky, B. (1993). Cognitive maps, cognitive collages, and spatial mental models. In A. Frank & I. Campari (Eds.), Spatial information theory (pp. 1424). Berlin: Springer. [23] Wang Y., Jia X., & Lee H.K. (2003). An indoors wireless positioning system based on wireless local area infrastructure. Presented at SatNav 2003, The 6th International Symposium on Satellite Navigation Technology Including Mobile Positioning & Location Services. [24] Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin & Review, 9: 625-636

14

The Bum Bag Navigator (BBN): An Advanced Pedestrian Navigation System Ilhan Aslan

Antonio Krüger

Dept. of Computer Science, Saarland University, Germany

Dept. of Computer Science, Saarland University, Germany

[email protected]

[email protected]

Abstract In this paper we present a light and wearable mobile navigation platform, which differs from available systems in the way information is presented to the user and by the way the user interacts with the system. This platform provides multiple displays to convey information, it is modular constructed and distributed to two PDAs. Thus, the PDAs can share resources and solve arising tasks in dialogue with each other. The communication between the separated modules is realized via WLAN and a client-server architecture. One of the PDAs can be used uncoupled from the wearable design of the platform and serves as a component for user input in different modalities, depending on the user’s context, whereas the other PDA is used with a VGA-card to drive a clip-on display that is attachable to glasses. Even though the system provides two displays, it is (due to its wearability) hands-free usable. Being a flexible platform, it can be easily reconfigured to test and evaluate different features and usability aspects of a pedestrian navigation service (i.e. to carry out navigation experiments in a Wizard Of Oz like study). In particular, this paper focuses on the combined usage of an uncoupled PDA module and a head mounted clip-on display in pedestrian navigation. Furthermore it proposes advanced methods to present information and to interact with such a system.

INTRODUCTION Methods for interaction with mobile systems and methods to present information on mobile systems have become a major topic. This development is mainly attributable to increase of usage of mobile sets in every day life (i.e. cell phones and PDAs). However, the performance of such mobile hardware depends on its size and always is behind the performance of the wider spread desktop hardware. As a consequence, the bid of software for mobile systems is often a short version of desktop alternatives. Therefore, resources (i.e. memory) of mobile sets are often overloaded. The platform that we present in this paper distributes navigation software to two mobile devices and in doing so increases the performance and the amount of available resources. We suggest that this concept will be used more often in near future in mobile scenarios. Today’s handheld systems do not provide sufficient methods for interaction. PDAs have a touch sensitive display and users handle a plastic pen to interact with the PDA. Advanced displays that combine interaction and presentation are only satisfying, when display sizes are sufficient (i.e. smart boards or tablet PCs). In Handheld displays the standard resolution is yet 320x240 pixel, that is, the presentation surface is very limited; and therefore, it is difficult to use the surface also for a graphical user interface. The interaction with a pedestrian navigation system is in particular different from the interaction with a navigation system for vehicles. Vehicle navigation systems prefer speech recognition and speech synthesize to communicate with the driver. The driver is in general busy and hopefully not hands

free driving. Speech is also a suitable method to communicate with a pedestrian navigation system; however, depending on the context (i.e. the user is not alone) speech is not the best solution. The user of a pedestrian navigation system is in a situation, where a more regular interaction then just choosing the route at the beginning of the navigation, is appropriate. Therefore, the user of a pedestrian navigation system needs a different user interface for input and output. We decided to use a head mounted clip-on display mainly for presenting information in different styles (i.e. abstract route information or information in 3D pedestrian view) to the user. In contrast to other mobile displays, head mounted displays (HMDs) are omnipresent; that is, the user is free to move her head without loosing focus on the display. A HMD is only visible to its user and in being so, it provides privacy; and last but not least a HMD is wearable and hands free usable. When pedestrian navigation takes place, it is very often combined with touristy activities and the user may be surrounded by other people. We also decided to use an additional PDA that provides a second display, which can be used as a graphical interface and furthermore that provides an additional overview of the surrounding environment. This design decision (see figure 1) results from an experiment, which showed, that only incremental step-by-step presentation of route information is provides not enough information to build up survey knowledge [2]. Survey knowledge may be described as geographical overview knowledge (i.e. knowledge of the positions of landmarks in relation to each other).

Figure 1: The Bum Bag Navigator The pedestrian navigation system that is described in this paper provides different modalities to interact with the system and to present information. We will present all modalities. Our system displays in default mode a 2D top-down view of the user’s environment on the bum-bag display and a 3D birds-eye view on the clip-on display (see figure 1), because users prefer a combination of a 2D and a 3D map [3]. We will also briefly describe two ex-

15

periments, where the platform was configured and used to test usability and psychological aspects of mobile assistants in different scenarios. The psychological experiment and the results of the experiment effected the final design of the Bum-Bag Navigator (BBN).

System Architecture The BBN is an example application that presents the usage of multiple displays in pedestrian navigation. Because of the results of our experiments in [2] we decided to present route and overview information parallel on different displays. We designed the system in such a way that the natural movement of the user and the style of presentation of the environment fits in. That is, when the user looks straight ahead, she can see a virtual model of her environment in pedestrian or birds-eye view, when she looks down she can see an overview of her surrounding environment in top down view (see figure 2).

Figure 2: Styles of view corresponding to head movements. For realizing our design decision, one of the PDAs has been placed on top of a bum bag, with the display facing up towards the user. We integrated all hardware parts of the system into that small bum bag that can be easily worn around the waist of the user. The bum bag itself consists of two parts, the top of the bum bag (see figure 3) and the main part of the bum bag (see figure 4). We prepared a piece of Styrofoam and embedded one of the PDAs1 that drives a tilt device and a Bluetooth receiver to which the PDA is connected, inside the Styrofoam. We also prepared a cloth cover, which has an opening for the PDA display and in which the Styrofoam fits in. We use buckles and Velcro on the bum bag in order to keep the top of the bum in place (see figure 3). The main part of the bum bag contains the second PDA, a VGA adapter and the control and power box of the HMD2. The overall weight of the bum bag (including batteries for the clip-on display) is less than 700 gram. The HMD is connected to the control box inside the bum bag via cable. Bum bags are known travelling bags and standard clothing articles for travellers; they do not attract too much attention. The MicroOptical Corporation offers also wireless version of the head mounted clip-on display. The clip-on itself remains the one component in the composition of the hardware, which attracts attention and comes across very artificial.

Figure 4: The bum bag and its components The basic software component of the BBN is our general purpose navigation platform M3I [1]. It provides a 3D VRML model of the surrounding environment and a user interface based on speech and Stylo interaction. The BBN distributes the underlying M3I navigation system to two PDAs; more precisely, the BBN runs in different modes, in standalone mode, where the application runs only on one PDA and where only M3I functionalities are activated, or in distributed mode where the application is distributed to a yet 2-tier architecture, a client and a server part. The server part is running on the PDA (the glasses-PDA) that is driving the HMD and because the HMD consumes a lot of processor time and memory space, the glasses-PDA which serves with different presentation styles to requests of the client(s) is not responsible for the evaluation of localisation information. The GPS data is evaluated by the PDA in the top of the bum bag (the bum-bag-PDA), which is the input interface of the system and therefore also responsible for any other kind of input (i.e. user input) in any modality (i.e. speech or gesture). The glasses-PDA is remote controlled by the bum-bag-PDA trough a well defined communication protocol that has multiple layers (see figure 5b). The operating system we have running on both PDAs is Pocket PC 2002. And according to Microsoft, Winsock, which is a programming library, provides the only way for an application to access TCP/IP on a Windows CE– based device. Figure 5a demonstrates a typical server-client dialog.

Figure 3: The top of the bum bag and its contents

1

HP iPAQ 5450

2

MicroOptical

16

Client

User interface

GPS evaluation

User input

Timer

Sentece

Server Parser

Layer1 Route

Cmd

Which route?

Which Command?

Terminate

Figure 6: Extra-View

GPS

Layer2 Parse GPS sentence

In the scope of pedestrian navigation the user can use Extra-View to get an overview of the environment; besides, it is fun to experiment with it, as it requires special user interaction (head moves).

Computation

b

Presentation

Figure 5: Client-Server architecture and communication protocol We use DotPocket, an external application that provides landscape view in different resolutions. As the HMD works only in resolution 640x480, the best visual result is achieved, when the glasses-PDA presents information in landscape view and a resolution of 640x480 pixels; otherwise, the image that is displayed on the HMD is distorted.

Abstract-View (a-view) An intelligent system is situation adaptive. That is, for example a tourist needs different presentation then a businessman, who may need a less time-consuming and better mapped out presentation. The Abstract-View delivers abstract way descriptions by dynamically filtering information out of the intern navigation data structure and creating html files out of it. The html files are then automatically presented inside the internet explorer. The presentation is on the clip on display, because the clip on display is easier to use, when the users is moving.

A Brief Review of the Special Modes of the BBN The BBN provides five special features implemented for the usage of multiple displays in navigation; and overall, as mentioned above, for the usage of head mounted displays in combination with an uncoupled module that provides a mobile display and different methods to interact. Extra-View (x-view) In the Extra-View feature, the head mounted display extends the “static” bum bag display vertical or horizontal. As the head mounted display is perceived omnipresent, the concatenation of bum bag display and head mounted display is achieved by head moves. One has to consider the fact that it is not necessary to move one’s eyes. Figure 6b demonstrates the idea of Extra-View. Additional to head moves, it is possible to position the “ghost display” by adjusting the clip-on (figure 6b). “Why should someone implement this feature and is it really useful? “ The view of the user is extended “four” times the amount of the view that is provided by the bum bag display, and even “eight” times, if diagonal extensions would be used. This feature is comparable to zooming out. However, since zooming is limited to the size of the display, the Extra-View feature takes advantage of the multi-display architecture of the BBN.

Figure 7: Abstract-View example “What kind of navigation information can be presented abstract?” Information that can be processed successive like route knowledge can be presented step by step. If you consider street crossings as landmarks, then Abstract-View displays the next two segments of route knowledge. Two successive segments are identified by three landmarks. Abstract-View displays the street name the user is currently localised and the names of the next two following streets on the route to her destination. The street names are augmented with direction information, that is, the street names are presented like traffic signs (see figure 7). This design was chosen, because traffic signs are familiar structures for presenting abstract direction information. Therefore a greater user acceptance is expected. Additionally time information is given about the current segments and the overall progress of the route.

17

Preview (p-view) In general a navigation system guides the user from location A to location B. The BBN provides a feature that displays the rest of the route, a camera fly from the actual position to the destination (i.e. location B); the camera fly presents information in birds-eye view. On any point of the route, the user has the possibility to preview the rest of the route. In BPN routes are defined through segment and subsegment objects. In every subsegment object, start-coordinates, end-coordinates of the subsegment and the turning angle to the next subsegment are stored. The virtual camera flies the virtual route and adjusts by using the actual subsegment data.

The acceleration is given in grad (angle) ranging from 90(pi/2) to -90(1.5*pi) degree. The walkthrough is realized by measuring the position of the virtual camera and its direction by using cosines and sinus functions and applying them to the acceleration angle. In the scope of navigation, the usage of the tilt device feature to explore the virtual environment is more accurate either before the navigation task begins or at situations, where the user has to wait for an intermediate event to take place; for example, waiting for the bus, or waiting for someone the user wants to meet. In [8] experiments are described, where participants showed that they had learned spatial relations from explorations in virtual environments.

User studies that used the BBN platform

Photo Graphics The BBN provides the possibility to present augmented photo graphics at crossroads to clarify the given direction. However, it is not clear, whether this kind of navigation aid (see figure 8) is necessary, and if it is possible to automate the virtual augmentation of photo graphics. The experiment that is described in [2] also observed the usability aspect of augmented photo graphics. Former investigations to automate sketch generation for mobile route descriptions are described in [4].

The BBN platform was used in two user studies. Both user studies were investigating aspects of pedestrian navigation and assistance systems. Whereas, one of the studies concentrated on the investigation of usability aspects of a shopping assistant that is similar to the one which is described in [5], the other study investigated psychological aspects of pedestrian navigation systems. In both user studies two PDAs were used, one served as an input interface and the other as an output interface.

The Zoo-Experiment

Figure 8: Augmented photo

Tilt-Control Walkthrough The top of the BBN itself becomes a user interface to the navigation system, when the tilt control feature is activated. By tilting the top of the bum bag (see figure 9) horizontal or vertical it is possible to interact with, and remote control what is displayed on the clip-on display. For example, you can use the top of the bum bag like a steering wheel to walk (drive) through the virtual scene. The speed of the walkthrough depends on the tilt angle. The SDK of the tilt device enables programmers to read out the horizontal and vertical acceleration.

In the Zoo-Experiment we investigated the relationship between pedestrian navigation and the acquisition of spatial knowledge. The acquisition of spatial knowledge is very important in the context of pedestrian navigation. A simple example situation that demonstrated the general importance of the acquisition of spatial knowledge is, the situation where people do not remember, where they have parked their cars the night before. Independent of the influence of alcohol, some people have more problems than other to build up spatial knowledge. Former investigations showed sex related differences between methods of wayfinding [6]. Our goal was to investigate the effect of step-by-step route information presented on landmarks, similar to the presentation style shown in figure8. Because, according to [7] learning is thus organized around the nodes of the decision system, the landmarks, while learning between landmarks is incidental and largely irrelevant. Our results show that the presentation of step-by-step information helps learning directional knowledge, however, it fails to help building up survey knowledge. We describe our results in more detail in [2]. The zoo experiment was carried out in a Wizard Of Oz like study, that is, the experimenter remote controlled with his PDA the PDA of the participant (see figure 10).

Figure 9: Tilt control user interface

Figure 10: Experimenter and participant

18

Usability of a shopping assistant In this experiment usability aspects of a shopping assistant, which is based on decision-theoretic planning will be analysed. This experiment is currently in process. Due to the experiences with the first user study [5], the actual shopping assistant provides the subjects an overview plan and highlights the actual position of the user and the position of the next intermediate destination (i.e. a shop) on the map. Figure 11 shows the user interface of the experimenter (input interface) and the user interface of the subjects (the output interface).

REFERENCES [1] Wasinger, R., Stahl, C., Krüger, A., (2003) M3I in a pedestrian navigation & exploration system, Mobile HCI 2003, Springer. [2] Antonio Krüger, Ilhan Aslan, Hubert Zimmer, The effects of Mobile Pedestrian Navigation Systems on the Concurrent Acquisition of Route and Survey Knowledge. Mobile HCI 2004 (to appear) [3] Rakkolainen, I., Timmerheid, J., Vainio, T., A 3D City Info for Mobile Users, Proceedings of the 3rd International Workshop in Intelligent Interactive Assistance and Mobile Multimedia Computing (IMC 2000), November 9-10, 2000, Rockstock, Germany, 115-212. [4] Andreas Butz,Jörg Baus, Antonio Krüger and Marco Lohse: Some Remarks on Automated Sketch Generation for Mobile Route Descriptions , Proceedings from the 1st Symposium on Smart Graphics, ACM Press, New York , 2001.

Figure 11: The user interfaces of the shopping-assistant

Conclusion and Future Work In this paper we have presented a first example for a pedestrian navigation platform, which regards our results of the experiment that are documented in [2]. We used two displays to convey navigation information, because we suggest that providing an overview map combined with step-by-step route information of the user’s surrounding environment will help the user to learn survey knowledge. We hope that by developing navigation systems, which help acquiring survey knowledge, problems that can appear during the navigation (i.e. GPS satellites are not available or roads are blocked) will not have bad consequences, as the user will have learned enough survey knowledge to easily reorient herself. In future, we will refine our navigation system ,and we still have to prove our hypothesis that the combined presentation of route and overview information helps to build up survey knowledge. We also plan to carry out an experiment that will help us to find out the best way/modality to convey information to users on multiple displays at the same time

[5] Bohnenberger, T.; Jameson, A.; Krüger, A.; and Butz, A. 2002. User acceptance of a decision-theoretic, locationaware shopping guide. In Gil, Y., ed., IUI 2002: International Conference on Intelligent User Interfaces. New York: ACM. 178–179. [6] Devlin, A.S. Bernstein, J. (1995) Interactive wayfinding: Use of cues by men and women. Journal of Environmental Psychology, 15, 23-28. [7] Werner S., Krieg-Brückner B., Herrmann T. (2000). Modelling navigational knowledge by route graphs. In: Ch. Freksa et al. (Eds): Spatial Cognition II 1849, pp. 295-316, 2000. Springer-Verlag Berlin Heidelberg 2000 [8] Gillner, S. & Mallot, H.A. (1998) Navigation and acquisition of spatial knowledge in a virtual maze. Journal of Cognitive Neuroscience, 10, 445-463.

19

Strategies for Self-Organization and Multimodal Output Coordination in Distributed Device Environments Christian Elting

Michael Hellenschmidt

European Media Lab GmbH Heidelberg, Germany [email protected]

Fraunhofer IGD Darmstadt Darmstadt, Germany [email protected]

Enabling this kind of ambient intelligence within mobile environments and the ability of devices to configure themselves into such a coherently acting ensemble, requires more than anticipating possible device combinations. Usually it is also impossible for the user to define the behaviour of device ensembles, because she would need detailed knowledge of the devices and their abilities. Moreover devices ensembles could be too large and the contexts of use too numerous to cover every situation by hand. Here software-infrastructures are needed, which are able to deal with dynamic changes in the device ensemble without requiring manual modification. Self-organizing mechanisms and conflict resolution strategies are necessary to establish co-operation between devices and to realize a coherent ensemble.

Abstract One of the great research challenges currently facing the Ubicomp community is the development of mechanisms for groups of devices to dynamically coordinate with one another in meaningful ways. This paper presents results of the project DynAMITE, which develops mechanisms for self-organizing device environments. Based on the SodaPop middleware model for architectural integration we developed conflict resolution strategies for distributed device environments. The multimodal presentation strategy supporting graphical user interfaces, speech synthesizers and virtual characters is presented here as well as the open source implementation, which is available from the project web site.

Keywords

The project DynAMITE (Dynamic Adaptive Multimodal ITEnsembles, [9]) faces the challenge to develop and implement software-infrastructures, which allow self-organization of ad-hoc appliance ensembles. To realize these challenges, DynAMITE covers different research areas within the field of ubiquitous computing and ambient intelligence:

Self-organization, ambient intelligence, ubiquitous computing, multimodal output presentation, conflict resolution strategies.

INTRODUCTION In the near future, a multitude of information appliances and smart artifacts will characterize everyone’s personal environment. To make the vision of co-operative device ensembles, which enable proactive support of the user come true, coherent teamwork among the environment’s appliances needs to be established. This is a main requirement to realize the visions of Ambient Intelligence [1,8,37]. The following scenarios illustrate the behaviour of smart devices within a mobile environment. Imaging a person in a living room watching TV while she uses a personal digital assistant (PDA) to browse the TV program. How should the device ensemble (TV set, loudspeakers, PDA) present the information to her? One way to do this might be to present a short introduction by speech on the loudspeakers, to render textual information about the movies on the PDA screen and to show stills of the movies on the screen of the TV set. It is not difficult to imagine such a scenario, which is able to deal with a fixed set of devices, situations and exceptions. But this system will only function properly only as long as the developers anticipate the device ensemble, which makes up the environment. Considering ubiquitous and mobile device scenarios this is not always possible. Imagine that the user is leaving her flat and she is taking her PDA with her. She might be entering an environment, which is entirely new to her (e.g. a lecture room or a railway station) and features previously unknown devices. In the lecture room the slides of the lesson given might be presented on the PDA of the user. And if the PDA owns loudspeakers, the speaker’s voice might also be presented acoustically. In the railway station a ticket-selling machine might present its graphical user interface on the PDA of the user.

•

The development and application of a software infrastructure for self-organizing environments

•

The development of ensemble topologies for Ambient Intelligence

•

And the development of conflict resolution strategies for competitive devices

When looking at the challenges of self-organization of device ensembles, we can distinguish two different aspects of selforganization: The architectural integration refers to the integration of a device into the ensemble, whose functionalities can be expressed by means of an ontology. E.g. headphones can be characterized as an acoustic output device, which makes it possible to route a lecturer’s speech input to this device. The operational integration describes the aspect of integrating new functionality, which is unknown to the system and cannot be expressed by an existing ontology. This is also true for assembling a completely new functionality, which was not present on a single device (e.g. the copy functionality provided by means of two VCRs). See [17] for more details about operational integration. DynAMITE focuses on the aspect of architectural integration. That means that device ensembles can be built from individual devices in an ad-hoc fashion (by the user or by the environment itself). Besides self-organizing mechanisms such ensembles need conflict resolution strategies to deal with competitive device situations. Furthermore device ensembles may change over time – due to hardware components entering or leaving the infrastructure

In all these environments, the PDA has to co-operate with the other devices in order to realize a coherent device ensemble.

20

integration by introducing two fundamental organization levels: Coarse-grained self-organization based on data-flow partitioning (the classification of a device into different components dealing with different parts of the processing pipeline), and fine-grained self-organization for functionally similar components based on a kind of “pattern matching” approach (similar components are connected to the same channels). SodaPop does not rely on a specific data flow topology, but rather allows arbitrary topologies to be created ad-hoc from the components provided by the devices in an ensemble.

or due to changes in the quality-of-service available for some infrastructure services. Especially the coordination of different multimodal output devices is challenging due to the large set of possible output realizations (e.g. graphic user interface, synthesized speech or virtual character). This paper is structured as follows: the next section gives a short overview about the SodaPop system, which we developed within the EMBASSI project [20] and which is basis for the development of the DynAMITE infrastructure. The we explain how conflict resolution strategies are applied within a SodaPop device ensemble by means of utility value functions and describe our strategy to coordinate multimodal output devices within a dynamic mobile environment. The types of output devices we consider here are graphical user interfaces, speech syntheses and virtual characters. After that we describe our implementation and give an overview of the current version of our open source system, which is available from our project website [9] and give some examples of the dynamics of our multimodal output strategy. After the discussion of related work we summarize our results and give an outline of our next steps.

SELF-ORGANIZATION IN DYNAMIC AGENT INFRASTRUCTURES A central requirement for a software infrastructure supporting architectural integration is the support of ensembles, which are built from individual devices in an ad-hoc fashion (either manually by the user or automatically by the infrastructure). Therefore it must not rely on a central controller – any device must be able to operate stand-alone. Such an architecture has to meet the following objectives: •

ensure the independence of components

•

allow dynamic extensibility by new components

•

avoid central components (to prevent single point of failures or bottlenecks)

•

support a distributed implementation

•

allow flexible re-use of components

•

and provide transparent service arbitration

Figure 1. A home entertainment appliance and its internal component structure: Here six transducers are present, which are connected by means of five channels. The discrimination of the topology into channels and transducers makes it also possible to apply different conflict resolution strategies to each channel. These channel strategies make the communication of components flexible and guarantee the extensibility of the system. In the following section we describe in detail the application of channel strategies by means of utility value functions and outline our approach for a conflict resolution strategy for self-organizing multimodal presentations within a dynamic mobile device ensemble.

To meet these challenges we developed the SodaPop (selforganizing data-flow architectures supporting ontology-based problem decomposition) middleware model for self-organizing infrastructures (for more details please refer to [16] and [19]). This section only outlines the basic principles of SodaPop for a common understanding of our ideas for self-organizing environments and explains the application of strategies for conflict resolution.

CHANNEL STRATEGIES In order to guarantee the dynamics of the device ensemble (i.e. the possibility to add and remove components and devices at runtime) the application of strategies for conflict resolution must not be hard-wired. In our living room scenario the user wants to inform herself about the today’s movies. We assume that the description of the movies is coming from an electronic program guide as text information. How should this text be presented to the user by means of the output functionalities of the devices, which are connected to the presentation channel? Of course it might be possible to implement a hard-wired channel strategy that covers all possible device configurations (e.g. presence of a PDA, loudspeakers busy with a movie, etc.). But what will happen if a complete new situation is encountered (e.g. the introduction of a second PDA)?

First we looked at the internal event processing pipeline of the physical devices we wanted to support [20]. In principle, they observe user input, interpret this input into device actions, change the environment and render output. Figure 1 illustrates this by means of an TV set. Physical user interactions are translated to events. Different components for interpreting these events determine the appropriate action to be performed, and finally the actuators are physically executing these actions. In SodaPop we call the components transducers and the interfaces between them channels. The basic idea of SodaPop to bring more than one device together in order to turn private interfaces between processing stages into public channels. Thus it is possible to achieve an architectural

21

The strategy we present in this paper implements an approach for problem decomposition within the domain of multimodal presentation planning. Problem decomposition stands for the decomposition of an event into multiple events, which are processed by different components cooperatively.

The next section explains the general channel-transducer architecture in DynAMITE and how utility value functions are used to solve this problem. After that we illustrate the MMO channel strategy. This strategy achieves the ad-hoc integration of new output transducers, which might be located on several output devices, into a coherent presentation.

Multimodal output coordination strategy

Channel-transducer model

System output to the user can be presented by means of different output devices (e.g. displays or speakers) as well as different output modalities (e.g. pictures or speech). In ubiquitous computing scenarios the set of output devices and available rendering components can change dynamically. The strategy we present here is able to include new output devices and rendering agents into the presentation planning process. The types we are supporting are graphical user interfaces (GUIs), virtual characters and speech synthesizer components. The strategy achieves the decomposition of a single output event to multiple events for each rendering component, which are distributed among multiple devices. We call this strategy MMO (multimodal output coordination) strategy.

A transducer, which subscribes to a channel, declares the set of messages it is able to process and how well it is suited for processing a certain message (for more details please refer to [19]). These aspects are described by the transducer’s utility value function. A utility value function is a function that maps a message to an utility value, which encodes the subscriber’s handling capabilities for the specific message. When a channel processes a message, it evaluates all utility value functions of the connected transducers and then decides – dependent from the channel strategy – which transducers will effectively receive the message (see Figure 2) or parts of the message for distributed problem solving.

Imagine a situation, in which a user is working on a desktop PC. System output is currently realized by a GUI-type rendering component and a speech synthesis-type rendering component, which are rendering output on the PC monitor resp. speakers. Both components are connected to a channel implementing the MMO strategy. However the user wants to augment the system by means of a new rendering component, which realizes an animated character. Therefore the user starts a third rendering component on a laptop. This component connects to the same channel as the GUI and the speech components.

Figure 2. The evaluation mechanism of the utility value functions. A channel, which has to deliver an event, evaluates all utility value functions of the connected transducers (left picture) and then decides, according to the channel strategy, which transducer will receive the message (right picture).

After the channel receives an output event (e.g. from a dialog managing component), the MMO strategy evaluates the utility value functions of all connected rendering agents. The utility values contain information about the type of the rendering component (e.g. character or GUI) as well as its current state (e.g. its location, width and size).

Both the transducers’ utility and the channel’s strategy are eventually based on the channel’s ontology – the semantics of the message, which are communicated across the channel. The definition of these ontologies is possible, because we are offering solutions for the architectural integration of dynamic device ensembles. As Figure 1 indicates there are several possible channels within a device topology. Each channel is responsible for delivering certain types of messages to the connected transducers. It is obvious that on each stage of the processing pipeline different strategies might be necessary. For example user output events might be processed by every transducer, which is able to handle it. But some stages of the processing pipeline might make it necessary that only one transducer gets an event even though more than one is able to process it.

Therefore the MMO strategy handler realizes that a new rendering component has been connected, which realizes an animated character consisting of lip movements, head movements, gestures and speech. Due to the fact that both components contain speech output the richer output of the character component is preferred over the speech synthesis component located on the PC. Moreover as the GUI and the character are not located on the same device no layout coordination needs to take place. As a result system output is now realized by means of the GUI on the PC and the animated character on the laptop omitting the speech synthesis.

Thus, DynAMITE realized at first three simplistic channel strategies, to demonstrate the dynamic abilities of SodaPop. As there are: •

The all-strategy, which forwards an event to all those components, whose utility value functions give back an positive value.

•

The random-strategy forwards the event to only one of the components whose utility value function is evaluated positive. This is determined by a random mechanism.

•

The best-strategy selects the component, whose utility value function is most appropriate with respect to the task associated with the event (for more details please refer to [19]).

We use the rendering component profiles to describe the functionality and restrictions of rendering components concerning the multimodal output, which they synthesize. This allows the automatic inclusion of previously unseen rendering components into a presentation at run-time. These profiles are based on Bernsen’s model of output unimodalities [7]. Our model of rendering component profiles is described in detail in [10]. In this work we focus on three different types of output components: virtual characters, speech syntheses and graphical user interfaces. However of each type of component an arbitrary number of instances can be started, which may be distributed among different devices (like PDAs, laptops or desktop PCs).

22

user interaction events (user’s speech or graphical user interface events). The assistant channel receives the task for playing a movie of music and transfers it to the most appropriate MPEGplayer or MP3-player assistant (Figure 4 bottom). Finally the multimodal output channel (Figure 4 top right) realizes the coordination of the different rendering components to map the amodal text information into different multimodal output representations. As illustrated in one of the previous sections its amount of output components is unlimited. There is also no restriction concerning the combination of the different output components.

For the execution of the MMO strategy we use the presentation planning tool described in [5]. This planner is the core of the MMO strategy. The software uses a hierarchical AI planning approach to decompose a complex presentation goal into primitive presentation acts. The primitive acts send rendering messages to a set of rendering components together with instantiated parameters for each component (e.g. screen location or emotions to be expressed). After that each rendering component displays the content to be rendered accordingly.

IMPLEMENTATION In this section we give an overview of our current implementation of the SodaPop Framework and its channel strategies. The SodaPop-framework as well as the Java API and the demonstrators are freely available from the DynAMITE website [9]. The API allows the implementation of own transducers and channels, as well as the application of the introduced channel strategies. Thus the realization of topologies as illustrated in Figure 1 becomes possible.

Figure 4. The example topology we provide together with our framework. The most appropriate assistant is chosen by means of the best-strategy, while the decomposition of rendering events is done by the MMO-strategy. To demonstrate the dynamic abilities of the SodaPop model and the channel strategies we implemented some components:

Figure 3. A transducer consists of two classes: the transducer itself and a SodaPopHandler, which handles the utility Value evaluation and the final execution of a message. The picture on the right shows what physically happens. Figure 3 illustrates some details of the SodaPop Java API. First a transducer instantiates the channels it wants to connect to. A channel description consists of a channel name and a parameter indicating whether the transducer only listens to messages on this channel or also wants to send events to it. Finally the channel strategy name is given. The transducer in the example illustrated in Figure 3 is connected to two channels, whereas both channels have the strategy “random”. Because the transducer is connected to channel “A” as a listener, a SodaPopHandler must be instantiated that handles the evaluation of the utility value function (Figure 3 top left). The SodaPopHandler handles also the execution of a message, if the channel strategy decides that this transducer will receive the event for further processing (Figure 3 bottom left).

•

a ViaVoice-based speech recogniser [39] to recognize user utterances

•

a graphical user interface

•

a dialog management component, which controls dialogs with the user

•

a speech synthesis rendering component

•

a virtual character rendering component

•

two MPEG-player applications with different device properties

•

a MP3-player application

If the user wants to watch a movie, she simple says “I want to see a movie please”, or if available presses the corresponding button of the graphical user interface. Interpreting this sentence, the dialog management component starts a disambiguation dialog with the user. Every system output is triggered by means of an output event, which is sent by the dialog management to the multimodal output channel. As soon as a movie (or a piece of music) is defined, the dialog management gives the task to the assistant channel, where it is conducted by one of the different available player applications.

By means of this API a demonstrator was built, especially illustrating the dynamics of the multimodal presentation components. This demonstrator realizes a home entertainment scenario. The topology of the demonstrator is illustrated in Figure 4. The topology implements the main channels of our generic appliance topology as illustrated in Figure 1. The user is able to interact with this environment either by speech or by graphical user interfaces to define movies or pieces of music that should be played by means of MP3-player-applications or MPEG-playerapplications. Its topology introduces three different channels. The dialog channel (the D-channel in Figure 4 top left) communicates

The graphics part of the virtual character component [2] is implemented in Java 3D. The character synchronizes its lip movements with speech generated by the MBrola speech synthesis [30]. The graphic user interface component consists of a Java GUI, in which the user can make selections by means of a choice box. The speech synthesis component uses an IBM Via Voice implementation of the Java speech API [39]. We also implemented a second type of speech synthesis, which is based on MBrola. This integration of an additional speech synthesis

23

After that the user switches off the TV and boots a laptop, because it contains some movies, which she did not find on the TV set system (Figure 5, bottom row, left picture). On the laptop only a GUI rendering component is present. After the GUI on the laptop is used for input the GUI switches again from PDA to laptop. However the speech synthesis contained on the PDA is used to augment the GUI output on the laptop. However the user prefers a richer output and installs the software for the animated character on the laptop. In order to prevent two speech renderings at once (by means of the speech synthesis as well as by the animated character) the speech synthesis on the PDA is switched off, a layout coordination between the character and the GUI is conducted and the user achieves the desired result (Figure 5, bottom row, right picture).

component shows the modularity of the demonstrator concerning the integration of new rendering components. Please note that the application of only one dialog management component is no restriction of the introduced principles of dynamic self-organizing device environments. The application of channel strategies makes it able to apply as many dialog management components as needed. We decided to apply only one dialog management component to keep the demonstrator as simple as possible. Thus one is able to concentrate on the applicability of the different channel strategies.

EXAMPLES In this section we provide examples of the application for the MMO channel strategy in our implementation. We start with a virtual character rendering component and a graphical user interface running on the same device (Figure 5, top row, left picture). Together they instantiate a channel, which applies the MMO-strategy (cf. MMO channel in Figure 4).

RELATED WORK Research in the field of self-organizing device ensembles covers both co-operation and communication mechanisms of agent platform technologies and coordination strategies for multimodal output. We are especially interested in approaches for assisting the user while dealing with unknown device ensembles. Concerning the automated generation of multimodal presentations a lot of research has been conducted focusing on single-device scenarios. We give an overview over known approaches with respect to problem decomposition and point out differences to multi-device environments.

Self-organization of device ensembles In the past, many initiatives presented concepts for managing agent communication processes or addressed the problem of dynamic, self-organizing systems. The KQML [26] and FIPA [12] agent communication languages provide powerful methods for defining communication acts, as well as methods for service discovery. But the deployment within high dynamic ensembles is not possible. KQML as well as FIPA are using central routing components for delivering messages from agent to agent or a hard-wired peer-to-peer communication mechanism. Furthermore service discovery can be done by another heavy weight component, called yellow-page-service or facilitator, which allows the subscription for events as well as appropriate agents offering special services. A mechanism for conflict resolution does not exist. There is no solution if the service discovery function offers more than one possible address component. The Open Agent Architecture (OAA) (see [27] and [35]) and the Galaxy Communicator Architecture [13] provide architectures for multi-agent systems supporting multimodal interaction. Whereas Galaxy uses a centralized hub-component, which uses routing rules for modeling how messages are transferred between different system components, the OAA uses a special meta-agent with Prolog-based decomposition and recombination strategies. Both approaches have advantages as well as disadvantages. The routing rules of Galaxy as well as the strategies implemented within the OAA meta agents make dynamic sampling of nonstatic ensembles very difficult. Rules have to be adapted each time the configuration of the ensemble changes. On the other hand, on top of Galaxy and OAA systems can be built, which behave predictably in every possible situation. Galaxy as well as the OAA do not support mechanisms for structuring a component topology, which means that a designer has to know the whole system (and its behavior rules) and all interfaces, which connect the different components.

Figure 5. Examples for the coordination of different output components running on different physical devices. The coordination is done by the MMO-channel strategy. The window of the animated character is adapted according to the size and location of the GUI window. After the evaluation of all utility values functions the MMO strategy knows by means of the rendering component profile that the character is able to look at the GUI, which is also executed. However the user rather prefers to access the system by means of a PDA and starts a GUI component and a speech synthesis component on it (Figure 5, top row, right picture). Together they extend the previously existing MMO-channel. The GUI is still hidden until the MMO strategy decides to pop it up. With the next presentation task, the presentation planning strategy takes the utility value of the newly connected graphical user component into concern. The utility value indicates that the user has chosen the GUI on the PDA for processing an input (by means of a choice box). Therefore the strategy hides the other GUI on the TV screen assuming that the users rather want to work on the PDA than on the TV screen. The GUI on the PDA is also adapted according to the resolution of the PDA screen. The character on the TV screen is now looking at the user instead of the GUI, which is not visible on the TV screen anymore. The speech synthesis on the PDA is not used as the richer graphicalacoustical output of the character is preferred.

There are other research approaches (e.g. the Task Computing initiative [28]) supporting task aware computing, which bear

24

user environment. A central planning component generates and synchronizes SMIL [38] presentations on multiple displays and speakers. However it is not possible to take care of different presentation functionalities available on each device e.g., playing a movie might be possible on a desktop computer, but might not be possible on a PDA.

some resemblance to the ideas of activity based computing, which Don Norman described in 1976 [34]. Also the concept of situation-aware assistance realizes more assistance for the user and disburdens her from the responsibility to control every task and device separately. Here, Kirste [22] and Heider [17] describe the principles that are used to develop the EMBASSI architecture [20] and are the starting point for our self-organizing software infrastructure SodaPop. SodaPop is described in detail in Kirste [16] and Hellenschmidt [19]. The Speakeasy project that makes device and service interfaces available to users on multiple platforms [32] seems to be a similar approach as the task computing initiative. But the basic principle is very different to the DynAMITE approach. Whereas Speakeasy and Task Computing make the control of all devices (and their services) available for the user – and thus give all responsibilities to her – the project DynAMITE, and the work we present here, goes one step ahead. SodaPop, the software infrastructure of DynAMITE realizes self-organizing device ensembles that offer conflict resolution strategies to make the co-operation of devices possible without the need of an engagement of the user.

In multi-device environments teams of animated characters can be employed to mediate between devices [25]. In the system presented in [14] three avatars are split across two screens. André and Rist [4] also showed that the use of presentation teams results in believable dialogs. In order to be able to dynamically include new devices containing new rendering agents (e.g. a speech synthesis agent on a laptop) into a multi-device presentation, the new rendering agents have to identify themselves to the rest of the system by means of a service description. Ideally it should be possible to describe any rendering agent type by means of this description (e.g. movie player agents or speech synthesis agents). The model presented in [7] contains a complete and unique classification of all basic output modalities within the media of acoustics, graphics and haptics.

Multimodal output coordination There has been a lot of research in the past focusing on the generation of multimodal presentations on a single output device. Classic presentation planning systems described in [3], [11], [21] or [40] dealt with static output device environments, which were not supposed to change during a system run. Especially presentation planning in multimodal dialog systems is still restricted to a single output device. The presentation planning system described in [31] generates and coordinates multimodal output consisting of an animated character, pictures, maps and speech synthesis. The output is adapted to different display sizes, however the system does not support more than one output device.

CONCLUSION In this work we presented the results of the project DynAMITE (Dynamic Adaptive Multimodal IT-Ensembles). We introduced our implementation of the SodaPop middleware model for selforganizing data-flow architectures that separates a component topology into channels and transducers. The goal of DynAMITE is the architectural integration of new components, which might be located on mobile devices. We illustrated the four conflict resolution strategies for competing components provided by our framework. One of these strategies, the multimodal presentation strategy, coordinates presentation components of three types: graphical user interfaces, virtual characters and speech synthesizer. This strategy makes the application of an unlimited number of output components running on different devices possible. The advantages of our approach is the absence of a central component or a hard-wired intelligence by means of enabling self-organizing topologies and the application conflict resolution strategies based on the evaluation of utility value functions.

Moreover all of these approaches rely on a central presentation planning component, which is physically located on a certain device. However for ubiquitous computing scenarios, this is a setback. If the user switches off the device, on which this component is located, then it is not possible anymore to plan a presentation. Nowadays systems begin to emerge, which also take the dynamic addition of new output devices and services into account. This is crucial for ubiquitous computing environments, in which the device environment frequently changes.

We introduced our demonstrator, which can be downloaded from our project web site. The demonstrator realizes a home entertainment application illustrating the applicability of our approach. The next steps of our work in the DynAMITE project include the definition of basic component topologies usable as a blueprint for advanced scenarios as well as the definition of common vocabularies and ontologies for describing services and devices (e.g. UPnP). In previous work we already investigated the use of user profiles and output preferences within respect to multimodal environments [18]. We intend to include similar preference models into the strategies of DynAMITE system.

The Berlin subway system is playing Macromedia Flash adverts, which are presented synchronously on two displays in parallel without using sound [6]. A user study showed 82% approval of the system, which shows that coherent multi-device presentations can result in appealing presentations. However the system uses a static output device environment, both screens have the same size and are adjacent to each other, which needs not necessarily to be true in other ubiquitous computing environments. Robertson [36] built a prototype system for combining a PDA with a TV set. In this system the TV screen is used to add additional information to the graphic PDA output like pictures or maps. This system also relies on a static device environment. Han [15] describes the WebSplitter system, which is able to split a multimedia HTML document among available output devices like a laptop, a PDA or speakers. However the distribution of the content among the devices has to be done manually by the user. Kruppa and Krüger [24] investigate interaction paradigms between PDAs and large displays. Kray [23] presents an architecture for presentation planning in a multi-device and multi-

We also intend to evaluate our approach under real-life conditions. Here we intend to investigate issues related to the usability of our system. A lot of research has been conducted, which covers the cognitive effects of multimodal presentations on a single device. However the effects of multi-device presentations still remain to be investigated. Moreover in Ubicomp systems, where computers disappear, there is a danger that users feel lost and cannot exploit the system’s functionality [33]. We will investigate this in relation to different types of interface paradigms for Ubicomp home entertainment systems [29].

25

ACKNOWLEDGEMENTS

[15] Han R., Perret V., Naghshineh M.: WebSplitter: A Unified XML Framework for Multi-Device Collaborative Web Browsing, ACM Conference on Supported Collaborative Work CSCW, Philadelphia, PA, 2000.

The work presented in this paper was funded by the German ministry for education and research under the grants 01 IS C27 A and 01 IS C27 B as well as by the Klaus Tschira foundation.

[16] Heider Th., Kirste Th.: Architecture Consideration for

REFERENCES [1] Aarts E.: Ambient Intelligence: A multimedia Perspective,

interoperable multi-modal assistant systems, In: Forbig, Peter (Ed.). Interactive Systems, Design, Specificaion, and Verification, Proceedings, Berlin, Heidelberg, Springer International, 2002, pp. 253 – 267.

IEEE Multimedia (2004), 12-19.

[2] Alexa M., Berner U., Hellenschmidt M., Rieger T.: An Animation System for User Interface Agents. Proceedings WSCG 2001, The 9th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, 2001

[17] Heider Th. Kirste Th.: Supporting goal-based interaction

[3] André E., Müller J., Rist T.: WIP/PPP: Automatic

[18] Hellenschmidt M., Kirste Th, Rieger Th.: An Agent Based

within dynamic intelligent environments, Proc. 15th European Conference on Artificial Intelligence (ECAI 2002), Lyon , France, 2002.

Generation of Personalized Multimedia Presentations, 4th ACM International Multimedia Conference, Boston, MA, November 1996.

Approach to Distributed User Profile Management within a Multi-Modal Environment, IMC Workshop 2003 Proceedings: Assistance, Mobility and Applications Stuttgart, Fraunhofer IRB Verlag 2003, p. 129-135.

[4] André E., Rist T.: Presenting Through Performing: On the Use of Multiple Lifelike Characters in Knowledge-Based Presentation Systems, Int. Conf. on Intelligent User Interfaces IUI 2000, New Orleans, LA, 2000.

[19] Hellenschmidt M., Kirste Th.: SodaPop: A software infrastructure supporting self-organization in Intelligent Environments, 2nd IEEE Conference on Industrial Informatics, INDIN 04, Berlin, Germany, June, 2004.

[5] André E., Baldes S., Kleinbauer T., Rist. T.: CkuCkuCk 1.01 Planning Multimedia Presentations - User Manual and Installation Guide, available from http://www.dfki.de/imedia/miau/software/CkuCkuCk/manual /manual.html, 2000.

[6] Berliner

Fenster – Das http://www.berliner-fenster.de, 2004.

[20] Herfet Th., Kirste Th. and Schnaider M.: EMBASSIMultimodal Assistance for Infotainment and Service Infrastructures, In: Computers & Graphics 25, 4, pp. 581 – 592, 2001.

Fahrgastfernsehen,

[21] Kerpedjiev S., Carenini G., Roth S, Moore J.: Integrating

[7] Bernsen, N. O.: Multimodality in Language and Speech

Planning and Task-based Design for Multi-media Presentation, International Conference on Intelligent User Interfaces IUI’97, pp.145-152, Orlando, FL, 1997.

Systems - From Theory to Design Support Tool, in: Granstrom (ed.), Multimodality in Language and Speech Systems, Kluwer Academic Publishers, 2001.

[22] Kirste Th.: Situation-Aware Mobile Assistance, Proc. 1st EC/MSF Advanced Research Workshop, Bonas, France, June, 1999.

[8] Ducatel K., Bogdanowicz M., Scapolo F., Leijten J., Burgelman J.-C.: Scenarios for Ambient Intelligence 2010, ISTAG report, European Commission, Institute for Prospective Technological Studies, Seville, Nov. 2001.

[23] Kray C., Krüger A., Endres C.: Some Issues on Presentations in Intelligent Environments, European Symposium on Ambient Intelligence, Eindhoven, Netherlands, Nov. 3-4, 2003.

[9] DynAMITE, http://www.dynamite-project.org, 2004.

[24] Kruppa M., Krüger A.: Concepts for a Combined Use of

[10] Elting C., Möhler G.: Modeling Output in the EMBASSI Multimodal Dialog System, Int. Conf. on Multimodal Interfaces ICMI02, Pittsburgh, PA, October 14-16, 2002.

Personal Digital Assistants and Large Remote Displays, Proceedings of SimVis 2003, Magdeburg, March 2003.

[11] Feiner S. K., McKeown K. R.: Automating the Generation of

[25] Kruppa M.: The better remote control – Multiuser interaction with public displays, Workshop on Multi-User and Ubiquitous User Interfaces (MU3I), January 13, 2004.

Coordinated Multimedia Explanations, in: Mark Maybury & Wolfgang Wahlster (eds.), Readings in Intelligent User Interfaces, pp. 89-97, Morgan Kaufman Publishers, 1998.

[26] Labrou Y., Finin T., A Proposal for a new KQML

[12] FIPA, The Foundation for Intelligent Physical Agents,

Specification, TR CS-97-03, Computer Science and Electrical Engineering Department, University of Maryland Baltimore County, Baltimore, MD 21250, February 1997.

http://www.fipa.org, 2002.

[13] Galaxy Communicator Infrastructure, Project Homepage,

[27] Martin D.L., Cheyer A.J., Moran D.B.: The Open Agent

available from http://sls.csail.mit.edu/sls/technologies/galaxy.shtml, 2001.

Architecture: A Framework for Building Distributed Software Systems. Applied Artificial Intelligence, Vol 13., 20. 1-2, pp. 91 – 128, January- March 1999.

[14] Gebhard P., Kipp M., Klesen M., Rist T.: What Are They Going to Talk About? Towards Life-Like Characters that Reflect on Interactions with Users, Int. Conf. on Technologies for Interactive Digital Storytelling and Entertainment TIDSE'03, Darmstadt, Germany, March 2426, 2003.

[28] Masuoka R., Parsia B., Labrou Y.: Task-Computing – the Semantic Web meets Pervasive Computing, Proc. Of the 2nd International Semantic Web Conference (ISWC), Sanibel Island, Florida, USA, October, 2003.

26

[29] Meyer zu Kniendorf, Ch.: Interaktionskonzepte in verteilten

[34] Norman D.A.: The Psychology of Everyday Things, New

und vernetzten Systemen am Beispiel der Unterhaltungselektronik, 5. Berliner Werkstatt MenschMaschine-Systeme. ZMMS Spektrum, Band 18. Auszug aus Fortschritt-Berichte VDI, Reihe 22, Nr. 16, S. 488- 506, Düsseldorf: VDI Verlag GmbH, 2004.

York, Basic Book, first edition, 1976.

[35] Open Agent Architecture, http://www.ai.sri.com/~oaa/, 2001. [36] Robertson S., Wharton C., Ashworth C., Franzke M.: Dual Device User Interface Design: PDAs and Interactive Television, CHI, Vancouver, British Columbia, Canada, 1996.

[30] MBROLA – Towards a Freely Available Multilingual Synthesizer, Faculté Polytechnique de Mons, MULTITELTCTS Lab, available from http://tcts.fpms.ac.be/synthesis/mbrola.html, Mons, Belgium, 1999.

[37] Shadbolt N.: Ambient Intelligence,

[38] SMIL, W3C Recommendation: Synchronized Multimedia

[31] Müller J., Poller P., Tschernomas V.: Situated Delegation-

Integration Language (SMIL 2.0), 2001.

Oriented Multimodal Presentation in SmartKom, Workshop Intelligent Situation-Aware Media and Presentations ISAMP, Edmonton, 2002.

[39] Via

Voice, voice/viavoice/

IBM,

http://www..ibm.com/software/

[40] Zhou M. X., Feiner S. K.: Efficiently Planning Coherent

[32] Newman M.W., Izadi S., Edwards W.K., Sedivy J.Z., Smith

Visual Discourse, Journal of Knowledge-based Systems, 10(5) pp. 275-286, 1998.

T.F.: User Interfaces When and Where They are needed: An Infrastructure for Recombinant Computing, Proceedings of UIST 02, Paris, France, 2002.

[33] Nijholt, A., Rist, T., Tuijnenbreijer, K., Lost in Ambient Intelligence?, CHI2004 Workshop: Lost in Intelligence?, Vienna, Austria, April 25, 2004.

IEEE Intelligent

Systems, 2-3, 2003.

Ambient

27

A context inference and multi-modal approach to mobile information access David West School of Information Technologies University of Sydney Sydney, Australia [email protected]

Trent Apted National ICT Australia Australian Technology Park Eveleigh, Australia [email protected]

Aaron Quigley School of Information Technologies University of Sydney Sydney, Australia [email protected]

sight, hearing) or common skills (e.g. speaking, reading, writing) [3 7]. Context awareness allows individuals to interact with systems which are aware of the environmental state (e.g. location, workgroup, activity) and computational state (e.g. applications, devices, services) of an individual [6].

ABSTRACT

Multimodal context-aware computing offers system support that incorporates the environmental, cognitive and computational state of an individual while allowing them to compose inputs and receive outputs across a range of available modalities. Here we present two support planes from our ubiquitous system architecture, entitled Nightingale. These include a multi-modal application framework and context-management and inference approach based on ontologies. In Nightingale the devices in use form a personal area network (PAN ) with access to local (PLAN ) or remote (PWAN ) devices or services as they become available. Our proposed architectural elements aim to simplify multimodal and multi-device interaction, as the proliferation of small personal and inter-connected computing devices increases.

Consider a home environment, such as that demonstrated in the EasyLiving project[4], which has been instrumented to collect contextual information and offer multi-modal applications. In such a set-up there are embedded computers and control elements, identification and authentication elements, a home network, an entertainment or media hub, wireless networking, service discovery, middleware and operating system support, handheld computers for data access throughout the home. Within the home environment an example multimodal application may incorporate a speech, universal remote and gestural input to a stereo system. An action such as a button press, a pointing gesture to the stereo and a spoken command of “Set this preference button to that stereo” will be resolved by the home system, to correlate to a certain button press with a given set of preferences for the home stereo. Within the EasyLiving environment “disagregated computing” allows for a user’s location and preferences to determine which set of inputs and outputs, across a set of computers, were connected to the currently active applications. While managing, resolving and fusing the inputs is a challenging task, so too is the problem of coordinating the devices, preferences and services across the home for a set of users[5 8]. Further complicating this issue is the ability for a user to cognitively deal with a large permutation set of options both within the home and while on the move [21]. In such an environment a contextaware system manages and stores environmental data, device profiles, resources and user preferences. The goal is to simplify, through automatic (learnt or specified) or semi-automatic means, the multi-device, multi-user, and multimodal system support issues.

ACM Classification Keywords

H.3.4 [Information Strorage and Retrieval]: Systems and Software; H.5.2 [Information Interfaces and Presentation]: User Interfaces—Interaction styles; I.2.4 [Artificial Intelligence]: Knowledge Representation Formalisms and Methods INTRODUCTION

Typically the goal of a pervasive computing system is to allow people to access their information or information services regardless of the actual technology currently available [5 16 18]. Such a goal presents a number of research challenges including security, mobile data management, human computer interaction, and distributed computing. In particular “multimodal interfaces” and “context awareness”are emerging as major trends in the realisation of pervasive computing systems. Multimodal interfaces allow individuals to interact with systems using a combination of their innate faculties (e.g. touch,

Our motivation for this research is to realise a pervasive computing system for supporting applications through non-desktop interfaces for accessing, organising and interacting with your own suite of “personal server” devices. The rest of this paper is organised as follows,

This research is funded through the Nightingale Project and is supported by the Smart Internet Technology Co-operative Research Centre and the National ICT Australia. National ICT Australia is funded through the Australian Government’s Backing Australia’s Ability initiative, in part through the Australian Research Council.

1

28

enabled devices. The framework identifies the major components of multimodal systems, and the markup languages used to describe information required by components and for data flowing between components. The input component consists of various elements for the recognition, interpretation, and integration of input modes, such as pen, speech, hand-writing, etc. Raw input signals from the user are sent to the recognition components specific to each modality. The recognition component may use a grammar described by a grammar mark-up language to produce useful input to the interpretation component. The interpretation component involves semantic interpretation of the output from the recognition stage.

the background section describes related and ongoing work in multimodal and context-aware computing, the Nightingale section describes our multimodal application framework and our approach to context representation and inference; the final section outlines our conclusions and future work in this area. BACKGROUND Multimodal Interfaces

Seminal work on multimodal interfaces includes Bolt’s “Put That There system” in 1980 [3] that involves the fusion of speech and pointing gestures to create, move, copy, name and remove objects on a large 2D screen. Input fusion attempts to combine the inputs from multiple modalities, either directly or through recognition, to form an aggregate or disambiguated input. For example in [3], a user can issue a command such as “create a yellow triangle there”. Deictic references were resolved by referencing the object at the x, y coordinate indicated by the cursor at the time the reference was spoken. In addition, the user could name objects for future referencing by issuing a command such as “call that my object”. When the system recognises command utterance it immediately switches to training mode so the name of the object can be learnt by the system.

The W3C are standardising an XML based language called the Extensible MultiModal Annotation Language (EMMA) to represent the input to the interaction manager in an application defined manner. An optional integration component takes the EMMA results from various modalities, unifies them, and sends the combined result to the interaction manager. The interaction manager coordinates the dialogue between the application and the user. This hosts the interface objects (input and output) as DOM based interfaces. Application state is stored in the session component, allowing the interaction manager to support state management, and temporary and persistent sessions for multimodal applications. The system and environment component supports application knowledge of device capabilities, user preferences and environmental conditions. Application output passes from the interaction manager to the generation component. The generation component is responsible for which mode, or modes, will be used to present information to the user. The styling component for a particular modality accepts output from the generation component. This converts the modality neutral output into a format suitable for display by the particular output modality, using a style sheet mark-up language (such as CSS). The styled output is finally passed to the rendering component which renders the output to produce the output signal.

Since this early work a number of multimodal applications [14] and architectures [7 10] have been developed. One common infrastructure in multimodal architectures is a multi-agent architecture, in which the various components of the multimodal system, such as speech recognisers, gesture recognisers, application processing logic, etc., are implemented as semi-autonomous agents which communicate through a support infrastructure. Where an agent is a software element that acts on, or has the power or authority to act on behalf of another element. The QuickSet system [7] is the basis for a number of military research applications, integrating a pen and speech multimodal system. QuickSet goes beyond using a pen for simple pointing commands, recognising more complex gestures, such as lines and arrows, via a neural network and a set of Hidden Markov Models (HMM). The system employs a logic based approach to multimodal fusion involving the use of typed feature structures and unification-based integration [14]. Subsequent research based on the QuickSet approach and others has shown that combining the input from various input modalities can reduce errorrates in interpreting user intent by as much as 40% in multimodal based systems as opposed to systems using only one modality for input [14]. In effect, there can be mutual disambiguation between input signals, which can be used to stabilise individual inherently errorprone recognition technologies, like speech and handwriting recognition.

While originally from linguistics, context is generally taken to mean “that which surrounds, and gives meaning to, something else”. However, early research in context-aware application programming focused almost exclusively on location and time [5]. Schmidt et al. presented some ideas for furthering these notions of context at a low level using Sensor Fusion and later presented the European Project TEA (Technology for Enabling Awareness) [20], which expanded the notions of context to include light level, pressure and proximity to other people, as well as a resolution layer to determine a user’s activity.

In parallel to many of these research efforts, the W3C have specified a multimodal interaction framework [2] to support the deployment of multimodal content on web-

Before this work, the narrow view of context meant that sophisticated modelling, resolving and naming techniques were not required. For most applications,

Context

2

29

all that was required was the choice of a symbolic or a geometric model of location. Schilit [19] was the first to present an architecture for context aware computing, but again the focus was on location and it did not involve resolution into higher levels of context. CyberDesk deals only with a different type of context information – a user’s selected text – but took a modular approach to context acquisition, abstraction and client services that later inspired the Context Toolkit [8]. However, in the Context Toolkit, context data is not separated from the sensors, and the path context flows through the system must be manually specified by application developers [9].

NIGHTINGALE

Project Nightingale aims to explore pervasive computing interaction mechanisms beyond the classical screen, key and mouse. As such, this research touches on aspects which include natural and adaptive interfaces, mobile data management, trusted wireless networking standards and personalisation. In this section we discuss two of our higher level architectural elements; namely multimodal interfaces and context-aware computing. Contribution

Most current multimodal applications and frameworks have been designed for operation on a single device or a limited number of closely coupled devices [14 10]. A number of these systems take an agent-based approach to support distributed processing, where agents are closely tied to a single or small number of applications. Agent-based approaches are typically structured around an input agent providing an interpretation of users’ intent, involving application-specific logic, and the knowledge of which application agents they must bind to in advance.

In order to share context between applications, there must be a standard way of modelling the context information and standard protocols for the exchange of this information. There are many tools to accomplish the exchange of generic data between heterogeneous applications including CORBA and Jini. However, an infrastructure should be as agnostic as possible with respect to hardware platform, operating system and programming language [9]. Furthermore, Ranganathan and Campbell [16] observe that such generic middleware falls short of providing ways for clients to be contextaware. They proposed another layer to facilitate the acquisition and reasoning of context, as well as the modification of clients’ behaviour.

Providing support systems that offer context awareness and multi-modal interaction has received little attention. When considering mobile multi-device scenarios there is a clear need to create scalable and reusable architectures that can cope with sensed environmental context data along with the the myriad devices, input modes and signals that will be routed to various applications – themselves residing on arbitrary devices; for example, applications running on a user’s mobile phone accessing services from their PDA.

The work of Ranganathan et al. on the GAIA project [17] involves development of their own middleware layer built over CORBA. They use ontologies to describe context predicates in their middleware in order to facilitate various types of inference, including rule-based and machine learning approaches. This enables their agents in different ubiquitous computing environments to have a common vocabulary and a common set of concepts when interacting with each other, while their Ontology Server assists with configuration management and context-sensitive application behaviour.

Such architectures should support perceptual spaces [13], in which the input provided by users is interpreted within the pervasive computing environment’s overall computing context. A goal of such environments will be to minimise the number, size and configuration settings for the devices that a user has to carry. Ideally, the environment will contain ambient, embedded devices, such as embedded cameras and directional microphones. Pervasive environments involving context-aware multimodal computing will move away from the assumption in present systems that a single user provides all input to a device in a given period. Instead, the contextaware environment will need to determine each user’s individual contribution to the input signal.

Strang et al. [21] investigation of ontologies for pervasive computing has inspired our research into this area. Rather than defining a single, monolithic language, they decompose their Context Ontology Language (CoOL) into a collection of several, grouped fragments and reach a high degree of formality by using ontologies as a fundament for their Aspect-Scale-Context (ASC) model. This model allows context information to be characterised and thus shared (and queried) independently of its representation. Wang et al. [23] are also looking at ontologies for modelling context in pervasive computing environments; specifically to leverage existing Semantic Web logic-based reasoning techniques. They have investigated the scalability of ontology-based context reasoning (using a framework they have called CONON ) compared to user-defined, rule-based reasoning.

The primary motivating goal for the Nightingale multimodal system has been to decouple multimodal applications from particular input devices; giving the user the ability to roam seamlessly between her PAN, PLAN, and PWAN environments, allowing her to access her full suite of application services, while taking advantage of the input and output devices that she encounters in a seamless manner. Our architecture allows a user to approach any input device or devices of arbitrary modality available in her surroundings, and to immediately use 3

30

Instance data consisting of an application specific interpretation of user intent. The format of the instance data is meaningful to the consumer of the EMMA document. Because of the ambiguity of interpreting various input sources, there may be more than one interpretation. This facilitates mutual disambiguation through fusion with other input modalities or the use of historical or current context data. Data model may be implicit. Specifies constraints on the format of the instance data. E.g. an XML schema or DTD for the XML instance data. Metadata allows the association of various pieces of information (annotations) about the produced instance data. E.g. timestamps (for multimodal fusion), the producing media (e.g. speech), confidence scores in the interpretation, information about the process that produced the interpretation, etc.

Figure 1: Mediated Input and Output in EMMA form

Applications agents have a generation component to distribute modality neutral, application defined output to suitable modalities. We are exploring fission mechanisms for multimodal output, taking into consideration user context such as current location, ambient noise level, available modalities, user preferences, etc., as discussed in the following section.

them to continue her multimodal dialogue. After the invocation of a user identification component, the input device will be automatically tailored to provide an interpretation suitable for her current application processing context. As her application processing context changes, the interpretation of her intentions will be modified on the fly. The following section describes our agent based architecture tailored for such pervasive computing environments.

The upper half of Figure 2 shows how the raw input signal from the user is converted into application defined input as EMMA documents. Applications supply modality specific grammars and interpreters to input agents. The grammar controls input recognition by constraining what is recognised in the input signal. Grammars can consist of context free grammars such as for speech. We anticipate in the future that the use of context will drive both how the recognition is refined and how the subsequent fusion is achieved. Recognised input is passed to the semantic interpretation component in the input agent. The application supplied interpreter converts raw recognised input into application specific meaningful input, in the form of EMMA.

Multimodal Architecture

The Nightingale model consists of an agent based architecture spanning multiple autonomous computing devices in a user’s PAN, PLAN, and PWAN, as illustrated in Figure 1. Application agents reside on the user’s home server, his office computer, or other computers. Agents currently coordinate through a simplified context-plane service. The aim is to make available through the context-plane the following aspects of service: details of the devices and services available, the various inputs and outputs that are available, historical usage patterns, current sensed computational and environmental context along with aggregated and inferred contexts. For example, when the context-data indicates the user is in the minimal connectivity PAN environment, application processing will typically take place on his personal server. Input and output agents may reside on embedded systems within the user’s environment (an embedded microphone in his vicinity for example), or they may reside directly on his personal server (an agent controlling pen-based input through a paired Bluetooth Anoto pen for example).

The lower half of Figure 2 shows how application defined output is converted into an output signal delivered to the user. The styling component converts application defined output into a format suitable for rendering by the modality specific rendering engine. An application must supply a style sheet to specify this process. For example, a multimodal email application may send a list of headers to an output agent. The style sheet will convert this to a format suitable for output using a textto-speech renderer, say, by creating a string suitable for the text-to-speech engine to dictate. In addition to providing interpreted input and styled output, applications can also gain direct access to the raw signal through a raw interface component of each input/output agent. Thus applications which would like to record audio directly from the user can access the raw microphone input from a speech agent. As with input, we expect to

Input agents send application defined, modality neutral input to application agents in the form of EMMA, part of the W3C multimodal application framework. EMMA is a language for exchanging data between the interpretation and integration/interaction manager components. It consists of: 4

31

Figure 2. Application supplied components for input and output agents the unique agent ID of the email agent.

offer context services to guide the application outputs as the context plane develops.

2. The pen agent requests the data plane location of the required grammar and interpreter for the email agent using the pen modality. The email agent responds with the locations.

From the above description, it is apparent that the application must supply an application/modality specific grammar and interpreter to input agents, and an application/modality specific style sheet to output agents. The application writer must write these components. Once the identification component of an IO agent determines which user is to use it, the agent will load the necessary components for the user’s current active application by mediation with the context and data planes. Our current implementation of the context plane uses a shared tuple-space. This provides the advantages of temporal and referential decoupling [15]. The bootstrapping process is described for a pen input agent, email application agent, and text-to-speech output agent in Figure 3.

3. The pen agent loads the grammar and interpreter from the data plane. 4-6. A similar process loads the styling component into the voice output agent. This also triggers the email agent to locate the binding information for the output agent by similar mediation with the context plane. 7. When a user speaks, commands relevant to the email application e.g. “read my email”, “next message”, “delete this message” are recognised. The grammar constrains the recognised commands, and the interpreter produces a modality neutral, application defined input to send to the email agent.

The first step (not shown on diagram) is for the user to indicate they wish to use an IO agent. Currently, we use a manual GUI based login mechanism for this. We are investigating using context data in the form of biometric and signal identification mechanisms (e.g. recognising a particular user by their speech-/hand-writing). Agents must obtain the necessary application supplied components and application binding information for a new user. The other steps are as follows:

8. Fusion of the input signal with multiple input modalities may occur in the email agent. The command is executed, and modality neutral output is generated. The generation component sends the output to the relevant attached output agents, in this case the voice agent. The output is styled by the styling component, and rendered to produce an output signal.

1. Applications register themselves with the user’s command agent in advance. The command agent typically runs on the user’s personal server. The pen agent requests the ApplicationWithFocus context for the given user. The command agent responds with

The user’s application processing context is controlled by the command agent. When an input agent is bound to a new user, in addition to loading an application control grammar/interpreter, a command grammar/interpreter is loaded. This separate grammar recognises 5

32

Figure 4: Gesture recognition results from drawable user interface experiment. Figure 3: Mediated Input and Output agent configuration

logic. They define the structure of modality neutral application input and output. In addition, they must write the interpreters/grammars/stylers for the specific input and output modalities they wish to support.

commands which change the application processing context; specifically, the user’s current active application. In our current implementation, the user changes active application by explicitly requesting an application focus shift. For example, they may say, “Switch to my scrapbook application.” This will trigger all currently bound input and output agents to reload their application specific components and bind to the new application. We are investigating alternative strategies to allow users to control multiple application simultaneously using different modalities. In addition, we are investigating a model whereby applications can update the command grammar on the fly in response to changes in their internal state. For example, on response to a new email, the email application agent may update the active command grammars to allow the user to listen to the new email immediately, even if email is not the currently active application. We are also planning to provide a mechanism for applications to refine their own control grammars in response to changes in application state.

Implementation

As a point in the design space for this architecture, we have developed an implementation incorporating pen and speech as input modes, and text-to-speech as output mode. We are planning a graphical IO mechanism based on XHTML/XUL. We have developed two applications to test our architecture, a multimodal email system, and a physical/digital scrapbook application. Our current implementation is written in Java to exploit cross-platform and byte code migration capabilities. Currently, LIME [12] is used to implement the context plane. By communicating on a shared multicast address, LIME offers the abstraction of the shared tuple space. All agents in multicast range of each other may coordinate in this fashion. To maintain scalability, agents should only coordinate with other agents in the user’s immediate vicinity. For this reason, the limited range of wireless technologies such as 802.11, or using the Bluetooth PAN profile, actually provides an advantage: only agents in range of each other, and consequently only those useful within a PLAN for the user, will coordinate their functions by merging their tuple spaces.

Software Architecture

The Nightingale multimodal architecture is an object oriented framework and is structured as a series of layers, beginning with the physical local data store and network. The data plane, not described here, is structured above this layer and provides persistence and distributed data management, including proactive caching strategies and application session support. Currently, the context plane consists of a tuple space for storing user, application, and IO agent context. The application plane consists of a generic agent based infrastructure, incorporating command agents, generic input and output agent support, and user identification support used by IO agents. The next layer contains the modality specific code for input, output and user identification. The application programmer writes the topmost layer, consisting of application specific

Application provided grammars, interpreters, and stylers have been implemented as Java classes. Application writers must define these components. The agent framework stores these classes in the data plane, and registers their locations in the context plane. When an input or output agent requests the location of the relevant components, the application agent framework responds with the data plane location. A custom Java class loader is then invoked to load the class as byte code directly from the data plane. In addition to speech, we have developed a novel pen and paper based drawable user interface, using 6

33

the PDA as a display device, so the Context Plane must decide what is the most appropriate output modality after the user walks into this room.

the Anoto pen and paper system [1], as a means of controlling applications. Users may draw their own interface, consisting of labelled buttons i.e. squares and other shapes with hand-writing annotations. As such, a user drawn email interface might consist of buttons labelled read, next, previous, delete, etc (Figure 2). They “click” a button by ticking the box they drew, and this is interpreted by the pen agent to produce application defined input to application agents. The results of an early user study of this form of interface are shown in Figure 4. They record the number of attempts users took to draw correctly recognised interface items. While these results are promising they do point to the need for the use of historical context data along with other inference and gesture learning techniques.

Our approach to the resolution of this context looks at combining simple user-driven, rule-based, probabilistic and temporal logic inference techniques. Further, we aim to explore the use of an evidential reasoning model to simplify the way in which an application can provide context to the Context Plane. An ontological approach is also being considered because it provides a flexible and scrutable manner in which to provide inference rules, as well as describing a common language for applications to describe such rules and provide context. However, this alone is not enough; we require not only to find a feasible solution to the selection of modality but also to determine the best solution, given the current context.

In addition we have developed a physical/digital scrapbook using the Anoto pen and paper system to promote reminiscence and memory sharing activities among the elderly. Users use the physical scrap book in the same way the use a regular scrapbook; they paste in photos, newspaper clippings, etc., and they annotate them with the pen. In addition they can add digital items, such as audio annotations, into their scrapbook. By drawing a square labelled audio they will be prompted to record a piece of audio. By ticking the box later, the audio will be replayed to them using an appropriate audio output agent in their vicinity. All digital items, including pen strokes, are stored in the data plane to automatically publish the scrap book as a web page. To synchronise physical items in the scrapbook with the digital version, the scrapbook can be placed under the gaze of a high resolution digital camera.

As we intend to pursue a risk-adverse context awareness, the best solution may come directly from the user, rather than a fully-automatic approach. To achieve this, we require a method for the user to override the decision. In this case we would select the next most appropriate modality; thus we also need to rank the possibilities. Furthermore, we aim to implicitly learn from this decision as there is evidence to suggest that, in the current context, the initially chosen modality was inappropriate in some way. Our initial approach is to explore reinforcement learning techniques. Another aspect being considered is how to make decisions for new applications. That is, applications for which we do not have any historical evidence to draw from. In this circumstance we would like to harness evidence from other, related applications in proportion to their usage by the current user. Our scenario is similar to those motivating other context-aware projects such as iROS [15] and GAIA [16].

Context Awareness

To simplify the integration of new users and devices in our pervasive computing environment, and to allow our applications to be context-aware we are developing a context infrastructure [9] – the Context Plane. The Context Plane actively seeks devices in the pervasive computing environment and is able to prioritise them based, not only on computing context, but on user and application preferences as well as physical context such as location and co-location with other users.

CONCLUSIONS AND FUTURE WORK

Our early work focused on the development of a series of multi-modal demonstrators across a range of devices. This work has informed the specification and design of our multimodal, context and data management architectural elements. While this incremental approach to research and development isn’t suitable in a mature research field, we have found it beneficial in a developing area such as pervasive computing, without established standards, operating system support or data protocols.

For example, a room might offer a large, wall-mounted public display screen and a user might have a personal device with a smaller screen, such as a PDA. In the past, the user has indicated a preference for the wallmounted display by asking for a modality switch when offered smaller screens. However, when the user walks into this room, the currently running application is an email application that has informed the Context Plane that the content is confidential; thus it might be inappropriate to display on a public display. But the Context Plane also knows that there are no other users co-located with the current user (i.e. in the same room). Note that for some applications, it might also be feasible to utilise audio output devices such as a speaker or Bluetooth headset. The application is currently using

Our future work aims to evaluate our current demonstrators with a group of elders, the development of a peer-to-peer multimodal application framework, the exploration of ontology based inference methods for context, a lightweight distributed database for mobile data management and SIP based protocols for handoff between managements elements in Nightingale. 7

34

applications: State-of-the-art systems and future research directions. Human Computer Interaction, 15(4):263–322, 2000.

REFERENCES

1. Anoto. Development guide for service enabled by Anoto functionality. Technical report, Anoto, http: //www.anoto.com, 2002. 2. M. Bodell, M. Johnston, S. Kumar, S. Potter, and K. Waters. W3C multimodal interaction framework. Note, W3C, http://www.w3.org/TR/ mmi-framework/, May 2003.

14. S. L. Oviatt. The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, ch. 14: Multimodal interfaces, 286–304. Lawrence Erlbaum Associates, Inc., Mahwah, New Jersey, USA, 2003.

3. R. A. Bolt. “Put-that-there”: Voice and gesture at the graphics interface. In Proceedings of 7th annual conference on Computer graphics and interactive techniques, 262–270, Seattle, Washington, USA, July 1980. ACM Press.

15. S. R. Ponnekanti, B. Johanson, et al. Portability, extensibility and robustness in iROS. In Proceedings of IEEE International Conference on Pervasive Computing and Communications, Dallas-Fort Worth, Texas, USA, Mar. 2003.

4. B. Brumitt, B. Meyers, et al. EasyLiving: Technologies for intelligent environments. In Proceedings of 2nd International Symposium on Handheld and Ubiquitous Computing (HUC 2000), 12–29, Bristol, UK, Sept. 2000. Springer-Verlag.

16. A. Ranganathan and R. H. Campbell. A middleware for context-aware agents in ubiquitous computing environments. In Proceedings of ACM/IFIP/USENIX International Middleware Conference, 143–161, Rio de Janeiro, Brazil, June 2003. Springer.

5. G. Chen and D. Kotz. A survey of contextaware mobile computing research. Technical Report TR2000-381, Department of Computer Science, Dartmouth College, Nov. 2000.

17. A. Ranganathan, R. E. McGrath, R. H. Campbell, and M. D. Mickunas. Ontologies in a pervasive computing environment. In Stuckenschmidt [22].

6. H. Chen and T. Finin. An ontology for context aware pervasive computing environments. In Stuckenschmidt [22].

18. N. Reithinger, J. Alexandersson, T. Becker, et al. SmartKom – adaptive and flexible multimodal access to multiple applications. In Proceedings of International Conference on Multimodal Interfaces 2003, Vancouver, BC, Nov. 2003.

7. P. R. Cohen, M. Johnston, D. McGee, et al. QuickSet: Multimodal interaction for distributed applications. In Proceedings of 5th ACM International Multimedia Conference, 31–40, Seattle, Washington, USA, Nov. 1997. ACM Press.

19. W. N. Schilit. A System Architecture for ContextAware Mobile Computing. PhD thesis, Graduate School of Arts and Sciences, Columbia University, May 1995.

8. A. K. Dey, D. Salber, and G. D. Abowd. A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. In HCI [11], 97–166.

20. A. Schmidt, K. A. Aidoo, A. Takaluoma, U. Tuomela, K. V. Laerhoven, and W. V. de Velde. Advanced interaction in context. In Proceedings of the 1st International Symposium on Handheld and Ubiquitous Computing (HUC’99), 89–101, Karlsruhe, Germany, Sept. 1999. Springer-Verlag.

9. J. I. Hong and J. A. Landay. An infrastructure approach to context-aware computing. In HCI [11], 287–303.

21. T. Strang, C. Linnhoff-Popien, and K. Frank. CoOL: A context ontology language to enable contextual interoperability. In Proceedings of 4th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems (DAIS2003), 236–247, Paris, France, Nov. 2003. Springer-Verlag.

10. M. W. Kadous and C. Sammut. MICA: Pervasive middleware for learning, sharing and talking. In IEEE PerCOM Workshops 2004, 176–180, 2004. 11. T. P. Moran, ed. Human-Computer Interaction Journal, Special Issue on Context-Aware Computing, vol. 16. Lawrence Erlbaum Associates, Inc., Aug. 2001.

22. H. Stuckenschmidt, ed. Proceedings of the IJCAI’03 Workshop on Ontologies and Distributed Systems, Acapulco, Mexico, Aug. 2003. Online proceedings: http://www.cs.vu.nl/~heiner/IJCAI-03/.

12. A. L. Murphy, G. P. Picco, and G.-C. Roman. LIME: A middleware for physical and logical mobility. In Proceedings of the 21st International Conference on Distributed Computing Systems (ICDCS21), 524–533, Phoenix, Arizona, USA, Apr. 2001.

23. X. H. Wang, D. Q. Zhang, T. Gu, and H. K. Pung. Ontology based context modeling and reasoning using OWL. In Proceedings of 2nd IEEE International Conference on Pervasive Computing and Communications, 18–22, Orlando, Florida, USA, Mar. 2004. IEEE Computer Society Press.

13. S. Oviatt, P. Cohen, J. Vergo, L. Duncan, B. Suhm, J. Bers, T. Holzman, T. Winograd, J. Landay, J. Larson, and D. Ferro. Designing the user interface for multimodal speech and pen-based gesture 8

35

Multimodal Interactions with an Instrumented Shelf Rainer Wasinger1, Michael Schneider1, Jörg Baus2, Antonio Krüger2 1 DFKI, GmbH, Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany {rainer.wasinger, michael.schneider}@dfki.de 2 Saarland University P.O.Box 15 11 50 66041 Saarbrücken, Germany {baus, krueger}@cs.uni-sb.de

Abstract. In this paper, we describe the initial implementation of our application demonstrator called ‘ShopAssist’. This application will aid users in product queries within a shopping scenario context. We describe the wide range of input modalities that our application supports such as speech, handwriting, intraand extra gestures, and the mixed modality combinations that promote advanced user interaction with real-world and virtual objects.

1 Introduction Users are no longer limited to the “desktop computing” paradigm. Applications are now mobile and ubiquitous, and may span multiple and changing contexts. Depending on a user’s current state, these contexts often require the use of different input and output modalities such as speech, handwriting and gesture. In addition, users often have a preference for the input modalities they wish to communicate through at any instance in time, and this is also largely influenced by surrounding environmental factors such as background noise, crowds, and access to the underlying physical and virtual data spaces. Data spaces are no longer limited to graphical objects on a computer display. Modern ubiquitous and mobile computing scenarios now require user interactions that span both virtual and physical spaces. It is these concepts that this paper will discuss. In section 2, we describe the scenario that we are implementing, and include motivations for our work. This is followed in section 3 by an outline of the architecture on which the scenario is based. In section 4, we discuss the interaction forms that we are implementing, and the flexibility that can arise by mixing and matching input types. We provide our conclusions in section 5.

36

2 Scenario Although research is now being conducted on interfaces for ubiquitous computing [1], and for the domain of shopping [5, 2], the combination of these areas with that of multimodal interaction [6, 4] is only slowly gaining momentum. Our ShopAssist application combines the context of shopping, with that of mobile and ubiquitous computing and multimodal interaction. Two different scenarios are currently being catered for. The first scenario supports the use of a shopping trolley in a grocery shop, and aids the user through plan recognition strategies [5]. The second scenario supports the use of a pocket PC within for example an electronics shop, and supports the user by providing a rich range of interaction possibilities. It is the second scenario that we will focus on in this paper. Consider a user in possession of a pocket PC, browsing through a “real-world” electronics store. The user connects to a data container of their liking (i.e. a shelf), and then waits for the relevant product information for each of the objects in the shelf to be automatically downloaded (e.g. digital cameras). If the shelf does not contain all of the models that the user is currently interested in (perhaps because they are out-ofstock, or because the store does not stock them), the user will still be able to download the required product information from the environment’s server, and use this for example in product comparisons.

Fig. 1. A combined virtual-physical interaction used to compare two products

Upon synchronization with the data container, the user has the ability to interact with the virtual set of products downloaded onto the pocket PC’s display, with the physical set of products on the shelf, or with a combination of physical and virtual products making up the data space. Fig. 1 shows an example of a combined virtualphysical interaction in the form of a product comparison query. Upon downloading the data container, the user also has the option of disconnecting from the shelf and the surrounding environment and browsing entirely offline. This limits the range of inter-

37

actions that are available to the user, but guarantees a high level of user privacy, especially if the user is dubious about the integrity of the shop. Depending on background noise levels within the electronics store, the user may decide to interact with the products through the medium of speech. If the store becomes very crowded, the user may switch to combined handwriting-gesture interaction. They may use extra-gestures to access the product (i.e. picking up or putting down a real-world object), or if the shop is too crowded or the products are locked away, the user may instead decide to use intra-gestures (i.e. pointing via stylus to objects on the pocket PC’s display). A motivation for shops to provide new interaction types such as extra gesture is that this supports interaction shopping. In contrast to window shopping (where a user is generally limited to viewing products locked behind a glass window), interaction shopping permits a user to physically interact with and query the objects around them. Based on our initial observations, we believe that tangible product queries provide the user with a certain fun factor less commonly found when browsing products solely on a computer display. Even without physically being able to touch an object, multimodal interaction can still benefit the more limiting form of window shopping. As an example, consider shopping on a Sunday when the shops are closed, or even during the week outside of business hours. Interaction through modalities such as speech, handwriting and intra gesture would in this case still permit a user with enough flexibility to purchase products. In our scenario, the environment also contains infrastructure to support public displays. When such displays are not currently in use by other users, a user may request detailed information on a particular product(s) to be displayed here, instead of on the comparably smaller display of the pocket PC (see Fig. 2). Upon deciding on a product to purchase, the user can finally add the product to their shopping list and either continue to browse, or progress to the counter. Our public display infrastructure works equally well for interaction shopping and window shopping, but currently only supports a single user at a time.

Fig. 2. Both private and public screens can be used to provide product information

38

3 Architecture As seen in the scenario above, there are three main components to this ubiquitous computing environment – the user, the device, and the environment [3]. The environment can be further decomposed into rooms, data containers and data objects, in which data containers refer to items such as shelves and tables, and data objects refer to the individual products contained within a data container. The shelves are fitted with RFID antennas, while the products are fitted with RFID tags. In this way, the environment’s server can identify which products have been picked up or put back into a shelf. The server runs on an RMI infrastructure and has access to the underlying product databases, which may be distributed throughout the environment. The server also supports multiple client types such as shopping trolleys and pocket PCs. In this shopping scenario, the pocket PC is the central communication portal between the user and the environment. Data is communicated in XML format over a TCP/IP socket connection, and contains information on the products within a container, as well as the associated dynamic language grammars for each product type. With the exception of extra gesture recognition, all of the interaction processing is performed locally and in real time on the pocket PC itself. In situations where multiple shelves exist in a single room, infra-red beacons are used to identify each shelf. The downloading of the container data onto the pocket PC takes place over a wireless LAN connection. The ShopAssist’s input modalities include speech, handwriting, intra and extra gesture, while its output modalities include speech, text and graphics. IBM’s Embedded ViaVoice V4.21 is used for speech recognition, while Microsoft Transcriber V1.51 is used for character recognition. The grammars are derived from product types within the product database, and are dynamically loaded upon synchronization with the shelf. Each grammar contains separate values for the modalities of speech and handwriting. The speech grammars may theoretically contain any number of words, however our grammars only contain and have only been tested using around 50-100 unique words. Fig. 3 shows a combined handwriting-gesture interaction that has been mapped to valid grammar entries. “CMD_H” refers to a handwriting event that has been mapped to a COMMAND, and “OBJ_GI” refers to an intra-gesture event that has been mapped to an OBJECT referent. The system uses ScanSoft’s RealSpeak Solo2 for concatenative speech synthesis output, and can also fall back to IBM’s Embedded ViaVoice formant synthesizer depending on the available memory. Whereas the concatenative synthesizer sounds more natural, the formant synthesizer sounds more robotic, but requires much less memory. 1 2

IBM EVV, http://www-306.ibm.com/software/pervasive/embedded_viavoice_enterprise/ ScanSoft RealSpeak Solo, http://www.scansoft.com/realspeak/mobility/

39

Fig. 3. An example of how a handwriting event (1) is recognized by the character recognizer (2), and then mapped to a valid grammar entry (3). Also note the summarized modality recognition results (4)

4 Virtual and physical user interactions Our system supports the following input types: - speech, handwriting, and gesture, whereby gesture can be further categorized as being either of type intra-gesture or extra-gesture. Intra gestures refer to stylus input on the touch screen of the pocket PC, while extra gestures refer to physical real-world interaction with products. We currently only have one intra-gesture type (i.e. “intra-point”), and two extra gesture types (i.e. “extra-pickup”, and “extra-putback”). Intra-gestures could be extended in the future to also account for simple commands like “delete-from-shopping-list”, and “add-to-shopping-list”, while extra-gestures could be extended to include functionality like “extra-point”. In comparison to an “extra-pickup” command which requires physically touching a real-world object (based on RFID technology), an “extra-point” command would allow the user to select an object from a distance (for example, based on barcode scanning for very small distances, or based on optical marker recognition for longer distances). A primary goal of multimodal systems is to convert multimodal input into a language that is not dependent on any modality (i.e. uni-modal). All user input in our system is converted into the following uni-modal user input type, via a modality fusion module (first versions of which are published under [8]): +

40

(1)

For the context domain “digital cameras”, COMMAND refers to values such as “compare”, “price” or “mega pixels”, and OBJECT refers to values such as “PowerShot S1” or “PowerShot S60”. As an example (see Fig. 4) the combined speech and gesture input command: “How many mega pixels does this camera have ?” is mapped to: .

Fig. 4. The circled parse information shows a mapping of the modalities speech (S) and intra gesture (GI) to that of the uni-modal language elements command (CMD) and object (OBJ)

Although we have stated above that speech, handwriting and gesture input can all be used to form a uni-modal result, not all of the combinatorics in combining these inputs are currently available in our system. As an example, if three different modalities are available for both the COMMAND and a single OBJECT referent, we would have 9 modality combinations that arise from mixing modalities together, and 27 modality combinations for a single COMMAND and two OBJECT references. This is even before considering the effects of overlayed modality information as in the case of the following example: “How many mega pixels does the PowerShot S50 have?”, in which both speech and gesture are used to define the same object referent. Fig. 5 shows the input modality combinations that are being implemented.

41

Fig. 5. Input modality combinations currently being implemented in the ShopAssist demonstrator

Indeed some of the implemented modality combinations require new interaction metaphors to work. For example, by providing the user with a visual “What Can I Say” (see Fig. 6), the system can evaluate modality combinations in which not only the OBJECT but also the COMMAND are obtained via intra-gesture:

Fig. 6. Displayed as scrolling text, a visual “what can I say” can allow the user to access command functionality through stylus intra-gestures

As described in [7], situational statements about the surrounding environment may affect which input modalities to use. For example, a very noisy environment may require the use of combined gesture-handwriting interaction, and a very crowded environment may require the use of sole intra-gesture interaction. Aside from environment characteristics, user requirements also affect multimodal input. For example if the user is on the move, speech interaction may be the best form of humancomputer-interaction to use. Device requirements further affect which modalities should be used, for example in our scenario, speech input requires the user to press a button on the PDA to start and stop an utterance3. Handwriting input requires a touch screen and a stylus. Intra-gestures require a touch screen and finger input, and extragestures require only the use of a hand (and of course the availability of a real-world object to interact with).

3

It is interesting to note that unlike office environments where background noise may be negligible enough for a recognizer to be always actively listening, mobile scenarios are much less likely to support such ideal conditions.

42

Each situational statement can contribute in determining which modalities are best to use. A future extension to the ShopAssist will be to automate the recognition of situational statements and to alert the user of the most appropriate input modalities to use, or alternatively pro-actively bias certain input types based on these situational statements to aid in increased user input recognition accuracy.

5 Future Work and Conclusions Future work will now turn towards usability testing. We will try to determine the level of learning required for multimodal interaction by unfamiliar users, and the acceptance level that users will have for new interaction types, especially within a public environment. The systems ability to sustain acceptable recognition rates when placed outside of a controlled test environment will also need to be studied. This paper has described the potential use of multimodal interactions within a ubiquitous shopping scenario. We have described the architecture required for such an implementation, and have also outlined the interaction modalities and modality combinations that may be relevant for physical and virtual data spaces.

References 1. Dey, A., Ljundstrand, P., Schmidt, A., "Distributed and Disappearing User Interfaces in Ubiquitous Computing", CHI Workshop, (2001). 2. FutureStore. Future store initiative, June 2003. Official website: http://www.future-store.org 3. Kray, C., Wasinger, R., Kortuem, G., "Concepts and issues in interfaces for multiple users and multiple devices", Workshop on Multi-User and Ubiquitous User Interfaces (MU3I) at IUI/CADUI, (2004), pp. 7-12. 4. Oviatt, S.L., "Multimodal interfaces", In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, (2003), pp. 286-304. 5. Schneider, M., "A Smart Shopping Assistant utilizing Adaptive Plan Recognition", ABIS Workshop on adaptivity and user modelling in interactive software systems, (2003), pp.331334. 6. Wahlster, W., "Towards Symmetric Multimodality: Fusion and Fission of Speech, Gesture, and Facial Expression", Proceedings of the 26rh German Conference on Artificial Intelligence, (2003), pp. 1-18. 7. Wasinger, R., Oliver, D., Heckmann, D., Braun, B., Brandherm, B., Stahl, C., "Adapting Spoken and Visual Output for a Pedestrian Navigation System, based on given Situational Statements", ABIS Workshop on adaptivity and user modelling in interactive software systems, (2003), pp.343-346. 8. Wasinger, R., Stahl, C., Krüger, A., "Robust speech interaction in a mobile environment through the use of multiple and different media input types", Proc. of EuroSpeech, (2003), pp. 1049-1052.

43

Personal Ontologies for feature selection in Intelligent Environment Visualisations David J. Carmichael, Judy Kay, Bob Kummerfeld School of Information Technologies University of Sydney NSW 2006 Australia {dcarmich,judy,bob}@it.usyd.edu.au

ABSTRACT

A central problem of Intelligent Environments is the Invisibility Problem. This arises when computing services are hidden in the environment in order to be unobtrusive and natural and yet the user is unable to make use of them because they are unaware of their existence. In this paper we describe a key aspect of the invisibility problem: the definition of personal models of places and services. We describe the use of Verified Concept Mapping to enable users to define personal ontologies of places and their associated people, sensors and services. Keywords: Intelligent Environments, Invisible Information, User Modelling, World Modelling, Personal Ontologies 1. INTRODUCTION

tion system. As she approaches the notice boards, they change to show information she might be interested in. She is somewhat disturbed by this as she doesn’t know how the system knows she is there. These two cases give examples of both parts of the invisibility problem. In the first example, Fred is unaware of the services available to him and in the second, Jenny is unaware of the sensors detecting her. Related to the Invisibility problem, we must also consider the fact that such environments can be personalised. In the example above, the personalisations of the environment - tailoring the notice boards to the nearby users - is essentially a complicating factor. The fact that the notices were different for Fred and Jenny is yet another thing about the environment which must be explained to them.

One branch of Ubiquitous Computing aims to augment the physical environment with many computational devices and sensors to assist people in their activities. Such environments are called Intelligent Environments (IEs).

However, personalisation can be useful if it is done more effectively. If the system models users more closely, it could reason that neither Fred or Jenny know about the functionality of the smart notice boards or the bluetooth location sensors and inform them.

Unobtrusiveness and Invisibility are goals of good intelligent environments. Systems should be so natural and well hidden in the environment that unless a user knows what components are there, they are almost invisible. This invisibility also has a problem: if things are so well hidden, how does a new user find out what services are available to them and what sensors are detecting them. We call this The Invisibility Problem.

In an IE there may be thousands of known places, devices, sensors and services. In order to present useful information to the user, we must develop ways to reason about what the user is interested in and already knows about. We want to provide personalised information about services and sensors in a personalised IE.

As an example consider Fred and Jenny, new postgraduate students commencing their degrees in a smart building. The building contains a number of invisible or non obvious systems such as bluetooth based location sensors, e-ink based notice boards, and wireless networking. As Fred enters the building he passes a notice board displaying undergraduate notices which he ignores and keeps walking. He does not realise that, were he to identify himself as a postgraduate student, the notice board would present items relevant to postgraduates. Jenny, on the other hand, enters the building carrying a bluetooth-enabled phone and is detected by the loca-

One key element to our work is the definition of personal models of places and their associated sensors and services. We tackle the personalisation of this with Verified Concept mapping to get the user to input what they know about and are interested in. Users are given a partially constructed concept map and can modify it to create a personal ontology of concepts within the Intelligent Environment. The rest of the paper is organised as follows. Section 2 describes the MyPlace system. Section 3 describes Verified Concept Mapping and how we use it to get personal ontologies from the user. Section 4 describes related work. Section 5 contains a discussion of future work and conclusions.

44

Figure 1: An extremely simplified world model showing the linkages between items 2. MYPLACE

In this section we describe the MyPlace system [8] which provides personalised views of ubiquitous computing environments in order to alleviate issues such as the invisibility problem described above. The novel aspects of this system are the uniform representation of users, places, locations, devices and services, the double personalisation and the accretion-resolution used for the world model. 2.1 Uniform Representation

The uniform representation means that all entities in the system are organised into an inheritance hierarchy of concepts. General concepts form the higher level nodes and the entities modelled become more specific as one moves down the tree. In Figure 1 a particular room (G61b) is a child of the more general concept of Place. The nodes of this tree are also interconnected by their semantic relationships. In this example ‘BTLocationSensor-1’ is located in ‘G61b’. Each concept can have simple valued attributes in addition to the semantic linkages to other entities. Concepts can have arbitrary attributes and linkages as there is no fixed schema.

Figure 2: The delivery architecture for MyPlace. ple, postgraduates are not allowed to use the expensive colour printer, and thus are not told about its existence. Secondly, the user is able to have preferences about what they care about without these preferences being released to the IE, thus protecting their privacy. 2.3 Accretion-Resolution

Another interesting aspect of MyPlace is the manner in which changing values are represented. As information is received from sensors in the system it is simply stored as evidence without interpretation - this is the accretion step. Then, when a value is requested, the stored evidence is examined and resolved to a value. For example in Figure 1, when BobsPhone is detected by BTLocationSensor-1, it is noted in the models for both entities as time stamped evidence. When we wish to know Bob’s location, all the evidence about what devices Bob is carrying and by what sensors they have been detected, is resolved to determine a value. The advantage of this approach is that it saves the system from having to put effort into interpreting sensor data as it enters the system. 3. VERIFIED CONCEPT MAPPING

MyPlace [8] includes double personalisation: The content is personalised for the user in two stages. The architecture is shown in Figure 2. First the Security Manager generates a subset of its total world model based on what the user is allowed to use or know about. Second, the user’s personal device selects what parts of the restricted world model the user should see.

One of the core challenges in defining personal ontologies of places involves user interface support for the creation of personal ontologies of places and their associated sensors and services. We tackle this with VCM [4], the verified concept mapper. This is based on a well established knowledge elicitation technique called concept mapping. Its origins are in education [11, 10, 9] where it was first used to enable learners to externalise their understanding of the relationships between the concepts in a learning area. It has been widely used in education and has also been used for knowledge elicitation[5].

This model gives two major benefits. Firstly the IE can select what the user is allowed to see. For exam-

A typical use of concept mapping, as we apply it, is to provide the user with a collection of concepts, some

2.2 Dual Personalisation

45

foundation map and associated concepts and relations to be used in individual maps. If any of the expected features is missing in the user’s personalised map, this is signalled to the user, either via a default message or a specialised one written by the author of the foundation map. For example, if all staff are expected to identify their own office and workplaces, omission of these would cause a message asking them to check this is what they intended. The second form of suggestion relates to unexpected features found in the map. This operates similarly, being coded by the authors of the foundation map, giving either default or specialised messages. The second part of the verification phase is the list of amendments the user has made to the map. This lists new concepts as a triple defining the concept they are linked to and the relationship on that link. Since this would generally be a quite short list, it should be plausible for a user to check this. It also lists elements the user has deleted from the foundation map. Figure 3: Screenshot from the MyPlace system for Fred, a male staff member in the school.

of which have already been placed into a concept map. This defines triples in terms of pairs of concepts and the relationships between them. In addition to this obviously semantic element, concept maps also make use of the two-dimensional layout. The user places concepts that the user considers similar in a vertical line. Also, concepts further to the left in the map are high level, compared with those at their right. The user also strives to make the maps symmetrical and pleasing to the eye. Essentially, the physical location of concepts in the map implies some relationships even when they are not captured as semantic links. Moreover, the presentation is intended to be natural, making it easily understood and memorable. These properties of concept mapping have contributed to its wide use and success in educational contexts. This is why we considered it an appealing foundation for our interface that would support users in amending and extended foundation ontologies of spaces with personal ontological differences. Its usefulness in education, even in the case of young children, suggests that it might be usable by a diverse range of users who want to understand and control the models that drive a personalised intelligent environment. VCM extends basic concept mapping by verifying the maps created. Once the user indicates they have completed a map, the verification stage checks the map and provides two forms of response: suggestions for the user to consider and a summary of the inferences it has drawn from the map. The suggestions are based upon two classes of checks. First, the author of the mapping task can define map features that they expect to be part of the map. In the context of the MyPlace work, authors would define the

This list is a basis for subsequent explanations of differential personalised behaviour in MyPlace: if a person asks for explanations of the way MyPlace presents information, these will be expressed in terms of their own changes to the foundation ontology. The essence of the verification is an interaction with the user to improve the likelyhood that accidental errors are likely to be identified: it is very easy for a user to click on the wrong concept or link. Once the verification phase is complete, we treat the map as a reliable model of the user’s ontology of the space. Moreover, the user is aware of the elements they have altered and have had their attention directed to unusual or unexpected elements that they have defined. This section describes how the user interacts with VCM. The interaction in context Figure 3 shows a screenshot from MyPlace. 3.1 Modelling Spaces

Figure 4 shows the foundation ontology for one region using the VCM system. The panel on the left hand side shows all the concepts known to the system. Colour is used to show the category these concepts belong to: places (black and blue); functions (red) and personal activities or annotations (green). The semantics of the system-supplied concepts is known. The user can define their own terms but the system will not be able to associate semantics with user-defined terms. The right hand side gives a list of relationships the user may define between concepts. Again colour is used to give information about the meaning of the linkage. Geometric relationships such as Contains or Adjacent are shown in black, room functions (Is) are shown in red and terms for use in personal ontologies are shown in green. The centre area of the screen is where the user lays out concepts and links them together in concept maps. They are provided with a foundation map of concepts which they can add to, modify or delete (although we do not show an example of deletion). It is critical to

46

Figure 4: The VCM system showing its foundation ontology. provide the foundation map for two reasons. Firstly, this saves the user the tedium (and potential errors) in defining parts of the ontology that are generally common. This is defined by the architecture of the building and spaces as well as the location of sensors and services. This will generally change slowly. This is also important in providing consistent images of the core elements of MyPlace. Secondly, we want to ensure that the number of personalised elements in a MyPlace ontology will be small enough for a user to readily distinguish and check those elements.

would be expected to know about. The Postgraduate workspace and undergraduate laboratory are shown as staff are expected to supervise research students and teach some undergraduate classes.

Figure 4 shows the Locations known to MyPlace placed in default positions. This arrangement of places is done the same regardless of who the user is. Within the University of Sydney (denoted Usyd) there are two building which the system knows about, Madsen and Storie Dixon. Rooms within these buildings are labelled by their numbers: G70,G71,G92 ... Lab381a,Lab381b. The author defined the map only with places relevant to the current MyPlace implementation.

The final automatically generated concept map is shown in Figure 7. This is for John, a visitor coming to give a guest seminar. In this case he is only given semantic labels on the Seminar Room and the Front Office.

The next stage in the mapping process is the automatic labelling of rooms based on the type of user. Similar to Role Based Access Control, the user’s access is determined by which groups they belong to. Figure 5 shows the concept map for Fred, a male staff member. We can see labels on all the shared rooms that a staff member

Figure 6 shows the foundation map generated for Jane, a new female postgraduate student. There are 2 differences from Fred’s map, firstly the undergraduate laboratories are unlabelled as new postgraduates are not expected to do any teaching. Secondly instead of labelling the male toilets the female toilets are labelled.

To this point the concept maps are all foundational. They might be presented to a user as part of the explanation for what is displayed by the MyPlace interface of Figure 3. We now describe how the individual user can amend a concept map with their personal ontological information. Figure 8 shows Fred’s personal annotations. He considers himself in his office when in G61a and he does research and teaching in the places marked as MyLabs. When he is in the seminar room (G92) or the meeting room (G46) he expects to be in meetings. As a result of

47

Figure 5: VCM showing annotations based on the type of user, in this case Fred is a male staff member.

Figure 6: VCM showing annotations based on the type of user, in this case Jane is a new female postgraduate student.

48

Figure 7: VCM showing annotations based on the type of user, in this case John, a visitor giving a seminar. being given this information, MyPlace can reason that in either of these locations Fred is uninterruptable as he is in meetings. (He can override this behaviour if he wishes, but that part of the system is outside the scope of this paper.) Fred has specified that he is to be considered at work when he is located in the more general location of the Madsen building rather than a specific room. This allows the system to aggregate location information and return the more general concept, ”at work”. MyPlace uses this in the reporting of Fred’s location, eg user A is only informed if Fred is at work or not. Jane annotated her diagram slightly differently in figure 9. Her office is G61a (the postgraduate workspace). She has linked the InMeetings concept to an additional room - G61a, her supervisor’s office. She believes that when located in that office she will be discussing research with her supervisor and thus want the system to treat her as being in a meeting. She has also created a new term SupervisorsOffice, which was not on the provided list. The system will not be able to automatically reason based on the creation of this concept, but she can define rules using it to allow this later. After the user has made their additions and changes to the concept map it is verified as was described in the introduction to Verified Concept Mapping. This allows the system to flag map features that were unexpected as well as potential omissions the user may have made. By consulting rules entered by the author of the foundation map, the system can detect anomalies such as users having meetings in the toilets or not choosing an office. We have similar ontologies for people and devices which have been omitted for space reasons.

4. RELATED WORK

There is a number of related projects which deal with some of the elements of MyPlace. We now briefly outline other work which deals with aspects of the invisibility problem, the modelling of users, devices, activities and places and the personalisation of information delivery in pervasive computing systems. The Digiscope described in [12] presents a system where the user is given an augmented reality view of invisible information in the environment by looking through a movable large semi-transparent perspex window. Chalmers et al.[2] have constructed a framework for context mediation, where they adapt the content of documents based on preferences about its semantic and syntactic properties. At Dartmouth College researchers have worked on an event based system for processing context data by defining processing operators and connecting them in an acyclic graph[3]. In our work we are attempting to allow users to specify what they are interested in having displayed to them using a similar idea. Other approaches to modelling the world have not taken a uniform approach. CMU’s project Aura[7] models entities in the world as either devices, people, areas or networks. The NEXUS project[1] focuses more on location and has a much stricter schema. Huhns and Stephens[6] believe that having people make personal ontologies could assist them with managing and finding their documents, both real and electronic. 5. CONCLUSIONS AND FUTURE WORK

The MyPlace system currently works for users’ personal devices within our department. We are currently work-

49

Figure 8: VCM showing the user’s personal annotation. This map shows Fred’s personal annotations, such where he considers himself to be at work, in meetings or in his office.

Figure 9: VCM showing the user’s personal annotation. This map shows Jane’s personal annotations, such where he considers herself to be at work, in meetings, her office or discussing research with her supervisor.

50

conference on Systems documentation, pages 197–206. ACM Press, 1995.

ing on ways for the user to further customise which elements of the world model they are interested in.

[6] Michael N. Huhns and Larry M. Stephens. Personal ontologies. IEEE Internet Computing, 3(5):85–87, September - October 1999.

Acknowledgements

The authors would like to acknowledge the support of the Smart Internet Technology CRC.

[7] G Judd and P Steenkiste. Providing contextual information to pervasive computing applications. In Proceedins of IEEE International Conference on Pervasive Computing (PERCOM), pages 133–142, March 2003.

REFERENCES

[1] Martin Bauer, Christian Becker, and Kurt Rothermel. Location models from the perspective of context-aware applications and mobile ad hoc networks. Personal and Ubiquitous Computing, 6(5-6):322–328, December 2002. [2] Dan Chalmers, Naranker Dulay, and Morris Sloman. A framework for contextual mediation in mobile and ubiquitous computing applied to the context-aware adaptation of maps. Personal and Ubiquitous Computing, 8(1):1–18, 2004.

[8] Judy Kay, Bob Kummerfeld, and David J Carmichael. Consistent modelling of users, devices and environments in a ubiquitous computing environment. Technical Report TR 547, School of Information Technologies, University of Sydney, http://www.it.usyd.edu.au/research/tr/tr547.pdf, June 2004.

[3] Guanling Chen and David Kotz. Context aggregation and dissemination in ubiquitous computing systems. In in Proceedings of Forth IEEE Workshop on Mobile Computing Systems and Applications, pages 105–114, June 2002.

[9] J. D Novak. The theory underlying concept maps and how to construct them, http://cmap.coginst.uwf.edu/info/ (visited july, 2004). [10] J. D Novak. Learning, creating, and using knowledge: Concept maps as facilitative tools in schools and corporations. Lawrence Erlbaum Associates, Mahweh, NJ, 1998.

[4] Laurent Cimolino, Judy Kay, and Amanda Miller. (to appear) concept mapping for eliciting verified personal ontologies. Special Issue on Concepts and Ontologies in WBES of the International Journal of Continuing Engineering Education and Lifelong Learning, accepted April 2003, to appear, 2004.

[11] J D Novak and D B Gowin. Learning how to learn. Cambridge University Press, 1984.

[5] Brian R. Gaines and Mildred L. G. Shaw. Knowledge acquisition and representation techniques in scholarly communication. In Proceedings of the 13th annual international

[12] Alois Rerscha and Markus Keller. Digiscope: An invisible worlds window. In Video in 5th International Conference on Ubiquitous Computing, October 2003.

51

OWL-based location ontology for context-aware services Thibaud Flury, Gilles Privat, Fano Ramparany France Telecom, R&D Division Technologies/ONE/Grenoble [[email protected]]

ABSTRACT Diverse location-sensing technologies must be integrated to meet the requirements of generalized location-awareness in ubiquitous computing environments. A formal and comprehensive ontology based modelling of location makes it possible to derive and integrate consistent high-level location information from sparse, heterogeneous location data. Defined this way, location information may provide a contextual common denominator for the semantic modelling of services in a service-oriented architecture. Keywords Location-based services, context-awareness, ontology, semantic modelling, service-oriented architecture

INTRODUCTION Beyond its established use in run-of-the-mill location-based services for mobile telecommunications, location provides the primary source of context data for ubiquitous computing environments. This expanded use of location information sets stronger requirements on the integration of location data, making it necessary to define it in a much more general and formal way than has been done so far. A flurry of interest around the semantic web [1] and semantic web services [2] has made it superfluous to advocate the relevance of semantic modelling and ontology-based languages. This interest arose from the original Web when machine-based (rather than human-centered) understanding of information was addressed. It has since spread into the realm of service-oriented-architectures [3] (SOA), to address similar concerns with networked services having to “understand” what other services may provide before they are able to work together, if not preconfigured to do so. The SOA paradigm has already made inroads into the domain of ubiquitous computing, being applied to the inter-operation of networked devices . Whether hardware devices or purely software services are addressed, ontology -based modelling is best used when circumscribed within simple and neatly bounded universes of discourse, which may limit its practical applicability. Early serviceadvertisement&discovery architectures, such as UPnP or Salutation[4] have tackled the apparently mundane task of pre-defining, at a level that cannot even begin to be called semantic, all potential features of any computer peripheral (not mentioning consumer

appliances…). It has proved to be a never-ending, thankless endeavour… To escape this modelling dead-end, the physical grounding inherent in context information, as used in ubiquitous computing, provides a natural common denominator. Physical context information, among which location is prominent, is potentially shared between the widest variety of devices, making it possible to both circumscribe the scope and raise the level of generality of the corresponding semantic modelling. Whereas regular software service or device ontologies are necessarily domain-specific, ontologies describing the generic concepts of physical context information, such as location, are potentially shared among a large spectrum of services that may take this physical context information into account for optimally adapting to it. For this, specific service features relevant to concepts defined in domain-specific ontologies have to be mapped, cross-referenced and integrated with more generic location information based on the proposed generic ontology. Our view is that this mapping is best addressed at the most general level of relevant ontologies, by drawing inferences based on generic rules attached to these ontologies. This paper attempts to define this semantic “common ground” of location information for device-based services encountered in ubiquitous computing environments. We first define a review of the common location models then proceed to show how they can be formally defined within the framework of ontology languages and applied to the integrated management of heterogeneous location information, to be used by various location-aware applications.

ABSTRACTING AND INTEGRATING LOCATION INFORMATION Location-determination solutions are numerous and highly heterogeneous [5][6]. They may use measurements from such physical modalities as sound, visible light, radio frequencies, or acceleration. They may retain the raw data provided by physical sensors as an all-or-nothing detection, or apply such numeric estimation techniques as multi-lateration or scene analysis with temporal particle filtering [7]. Our approach is to categorize these solutions based on some abstract mathematical model of space according to which they do, implicitly or explicitly, define the location information that they provide as an output.

52

This level of abstraction makes it possible to uncover common denominator aspects between various location-sensing technologies, but also between the needs of various location-based applications

Location models We draw a bottom-line distinction between four basic models of space [8].

Geometric models The first category is based on affine or affine-euclidean geometry and defines location by way of coordinates, usually chosen as orthogonal (a.k.a. cartesian) coordinates relative to a given Coordinate Reference System. Technologies such as multi-lateration or scene analy sis will usually provide their output according to such a model.

Set-theoretic models In this model, location is boiled down to a mere notion of an element (entity to be located) being a member of a set (entity with reference to which location is defined), without any metric information attached. Technologies based on proximity detection (such as RFID) or the simplest technologies based on cellular mobile networks (such as cell-ID) may be defined as such. Operations such as union, intersection or complement may be used to define reference location sets with respect to one another.

Graph-based models In generic graph-based models, any relation between entities may be construed as relevant to some more or less abstract concept of « location ». This does include all well-known cases of location relative to physically-grounded networks such as telecom networks (be they infrastructure-based or ad hoc networks), road networks, water-pipe networks, or any relation of physical proximity abstracted from its metric dimension. This may also correspond to more abstract networks such as virtual networks overlaid on physical networks, social networks, graphs describing possible routes within a building, etc. An ad hoc location system detecting exclusively relative connectivity or proximity between located entities, without any quantitative information attached, could provide its output as defining edges in a relative location graph.

Model integration for location management The neat separation described between these different models is for the sake of this introductory explanation. In all actual cases, location information relevant to these models is tightly interwoven, creating a complex web of location information where the graphbased model is actually a meta-model : all location information may, at the end of the day, be defined a relation between “location entities”. Relations attached to the set-based model may correspond to a hierarchy of inclusion between sets. A graph-based model may also be attached to the affine Euclidean model Transformations from one CRS to another may be attached to this model, defining a set of relations where one CRS may conventionally be considered “absolute” and the other “relative” when the corresponding graph is a tree with the “absolute” CRS at its root. Conversely, metric or geometric location information may be attached to a location network, corresponding to another mapping between the graph-based model and a metric or affine-euclidean space.

The process of abstracting raw location measurements into either of these separate models is thus only the first stage of a comprehensive location management policy. A complete location management system should be able to match and maintain up-to-date relations between separate pieces of location information pertaining to these different models in a given environment. This comprehensive location information will be the result of aggregating location information from several technologies, possibly corresponding to different models of space, to make it available to location-aware applications that may also provide queries and expect responses relevant to different models of space.

DEFINING A BASIC LOCATION ONTOLOGY We map here the models defined above informally to the metamodel and classes of a formal ontology .

Geometric model The geometric model is based upon three elementary co ncepts ( Figure 1):

Semantic models In this model, location concepts are defined relatively to a given universe of discourse such as architecture, physical geography, political geography, city planning, etc. The perspective is to link a mathematical definition of position (as in the geometric, the set or the structural model) to a more human friendly notion of place as in [9]. And the goal is to automate the process with an explicit definition of the semantics for the computers to understand it in an ubiquitous environment. Examples of these semantic concepts are given in Figure 5.

Shapes are 2D or 3D manifolds defining geometrically the physical locus, by way of a contour, of something that may be either a located object (e.g. a PDA) or an entity with regard to which other objects are located Coordinate Reference Systems are used to define these shapes in a relative fashion. As already mentioned, a CRS is defined relative to another CRS and ultimately by reference to a primary, supposedly “well-known” CRS, by way of a chain of transitive Geometric Transformations. The “ultimate” reference CRS may be a projection-based Cartesian CRS (such as the Lambert System used in France) or an oblate spheroidal geodetic coordinate system such as used with the WGS84 system.

53

Properties of concepts (classes) in the ontology are basically defined by relations (such as the subsumption relation) corresponding to a meta-meta level.

Ontology diagram legend

At a slightly less general level, all location information is defined by relations between “location entities”, corresponding to a metalevel of relations (e.g. the relation “isMemberOf” between an element and a set, or the relations defined between coordinate reference systems by way of geometrical transformations or their matrices. Finally, specific relations such as proximity or connectivity define their own concepts of “structural location”. The meta concept for these are defined by the classes GraphEdge and Node of the structural location model in our location ontology (Figure 2). These defines what we call the structural location model. Sub-concepts may be derived from this by specialization according to the nature of the graph (e.g. a graph defining possible passages between rooms in a building), or to specify a directed or nondirected graph (in which case generic graph edges are specialize into arcs or non-directed edges.

Figure 1 : Basic concepts of geometrical location ontology, with example instances

Set-based model The set-theoretic model is quite naturally based upon the two key concepts of Set and Element, and is, for all practical purposes (foregoing theoretical technicalities) drawn directly from mathematical Set Theory . An Element can be inside particular Sets, and a Set may contain any number of Elements. Two other concepts are linked to these primary ones, the Set-based Operation can define a Set by an operation (union, intersection, complement and such) combining existing ones. The Set-based relation can be used to define the relation between two particular sets (a set could contain another one; they could be disjoint or have a not null intersection).

Graph-based/structural location models As mentioned before, this model is used at three different levels, between which the distinction has to be made clear :

. Figure 2 : Basic concepts of graph-based location ontologies, with example instances

Semantic models At this level; location concepts may no longer be derived from generic ontologies and have to be defined in a domain specific way, such as the knowledge base for geographic location information defined in [10]. Two meta-concepts are nonetheless defined: those of LocatedEntity and ReferentElement, respectively. A generic location concept (a defined above at the meta level) may correspond to a LocationRelation between two instances or two subclasses of these.

54

on the environment with a meaning that the context management service and any client can agree with.

Figure 3 : Basic concepts of Semantic model, with example instances

RELATIONS BETWEEN MODELS AND ONTOLOGY INTEGRATION By trying to define the semantic model, it appears that all other models can be structured in a hierarchical fashion with this model at the top. The semantic model will provide the basis for an interpretation and specialization of all generic concepts defined in other models. Thus a basic location set will become a room by incorporating concepts from the architecture semantic model, or a CRS may become a geographical projection coordinate system by incorporating concepts from physical geography.

FORMAL ONTOLOGY SPECIFICATION As our discussion above suggests, location information is quite complex in terms of the number of concepts it requires to be correctly described, in terms of heterogeneity of domains these concepts belong to and in terms of interrelationship among these concepts. Expressive modelling languages such as frame-based knowledge representations, logic-based representations or semantic networks, are one step towards managing this complexity. Identifying and modelling the appropriate ontologies as a foundation for expressing localisation information is a further step along the same direction. We have used the OWL [12] language, recently adopted by W3C as a standard for the semantic Web and semantic Web Services. Historically this language has resulted from an evolution DAML+OIL ontology language[11], this one already extending RDF and RDF Schema with richer primitives. OWL overruns these languages in its ability to represent machine interpretable content on the Web in an easier way. It builds upon XML by using the XML syntax as a possible serialization. In this first version we've mainly focused on the modelling of indoor spaces, and more specifically office buildings. Components of our indoor office building model include concepts such as corridors, rooms, floors, buildings, staircases, elevators. The room concept has been specialized into sub concepts office, meeting room and conference room. Instantiating these ontologies enables a structured representation of location information, with a clear commitment about its semantics. This is necessary for client applications to get a perspective

Figure 4 : Example serialization of geometrical concepts using OWL

Example of specialized ontology Very general concepts are not useable directly, they should be specialized according to fields for which location information is relevant. It is essential to particularize the top level ontology into sub-domains like urbanism, architecture, political geography, and so forth. In our case of indoor location, architectural division of space seems to be relevant. The set-based division of an indoor space is related to architectural design. The architectural map is constituted of a set of 2D projections corresponding to each floor of the building. On these projections are drawn the walls and the door, space is naturally divided into rooms and corridors. The doors on these map indicate the interconnections between these elements. This very first study can put into concrete form a specialization of the location ontology. At the geometric level, space is described for each floor by a proper 2D rectangular coordinate reference system and a set of simple polygons. These are the specialization of the

55

generic coordinate reference system and the geometrical abstraction of shape. These 2D reference systems are the result of the projection of 3D reference system related to the building, this is a specialization of the Affine Euclidean transformation.

underlying semantics, they can use the location service in the most effective way..

Implicit context information The information that can be inferred is far broader than that explicitly stored. Deriving this information and storing it explicitly could reveal an efficient strategy as far as the time for retrieving this information is concerned, but will lead to consistency and overloading problems as more and more information will be stored. It is especially valid when it comes to location information, whose can be huge.

Polygons describing rooms and corridors are also a specialisation of the set concept. These sets are mutually exclusive and the result is that a floor can be defined as the union of all rooms and corridors within it. More complex rooms (that cannot be described using a simple polygon) can also be described using constructive area geometry with set based operations. The building itself can also be described as the union of all the floors. Inference rules can be used to deduce set based relations like non-void intersection, containment or disjunction, a room might be inside the building but not inside a particular floor.

(inside car1 parking1) (inside suitcase1 car1) (inside ?object parking1)

The architect map can also be used to infer some structural relations of space. Each rooms and corridors can be interpreted as nodes of a graph modelling their adjacency, when two of these elements are only separated by a wall (or when two sides of polygons have a part in common) the element are linked by an edge. The drawing of the doors can be used to model route between rooms and corridors.

?object : {…., suitcase1,….} if (inside ?object1 ?object2) and (inside ?object2 ?object3) we can derive from these two propositions that (inside ?object1 ?object3)

Context information completion Context information is incomplete. We usually use as many sensors as we need, but there are some cases where we need more sensors than are actually available. For example, some might be out of order or used by another application. There is also the case where there simply isn't any sensor available from which you can directly draw the context information you actually need. For example if the only location sensor you've got in a room is a weightsensitive pad located at its doorstep, and we've been notified the event that somebody weighting less than 30 kgs has entered the room. We might infer that a child is probably inside the room, although there aren't any indisputable evidence that the person is a child. We've simply drawn this inference from the fact that children usually weigh less than 40 kgs.

At the semantic level, the sets (currently rooms and corridors) may gain some sense, rooms may be office, meeting rooms or other points of interest like toilets using the symbols on the map. The rooms might also have a name, office rooms might have an owner. From the very simple architect map, an entire taxonomy of indoor space can be formalized in this way.

MODELLING OF RULES AND INFERENCES Adding reasoning capabilities for enabling inference goes one step beyond the mere definition of taxonomies. By reasoning we refer to the ability to dynamically chain a sequence of inference steps that build intermediary results from existing information, which at the end of the day results into relevant new information. There are several reasons why we might need such functionality, and we now elaborate on these points.

Bridging of ontologies Ontology representation languages provide only limited reasoning mechanisms. These mechanisms might reveal insufficient for answering queries across different domains. Here is an example where we need more than standard Ontology concept subsumption mechanism to answer a query: suppose that we know that printer LP150 is located at Cartesian coordinates (x=5, y=5), and that office B215 has a rectangle shape which corner points have coordinates (x=0, y=0), (x=0, y=10), (x=10, y=0), (x=10, y=10) in the same cartesion reference system. Clearly printer LP150 is right in the middle of room B215. This information can be derived using geometrical reasoning.

Needs for reasoning and inference rules Using these logical inference mechanisms to reason on the data is the most useful aspect of the advocated semantic modelling of location information. The primary goal of our study is to create a software system to manage location information. The ontology serves only as a repository for a knowledge base of location information. With inference rules and reasoning capability, the location service can automate some essential processes to reuse the information. It can collect the raw data from heterogeneous sensing technologies, interpret the corresponding location information, check it or aggregate it with other sources of information. A location management system can also use the reasoning capacities to answer to the needs of client applications, it can understand and respond to location requests, and generate events depending on specific complex conditions. In fact, location sensing technologies and client applications each have their own personal, often implicit, representations of space. If they can expose and share the

Context aggregation Aggregation is somewhat similar to completion with the difference that the inference is sound. Indeed, we get context information from disparate sensors measurements. We might be interested in information that can be jointly built upon two or more sensors rather than in information drawn from each sensor individually. For example, if we've been noticed by an identification sensor such

5

56

as an RFID tag reader that John has entered the room B215 and a vision based detection and tracking sensor locate that a person is sitting at a table in this room, we can conclude that John is sitting at the table in room B215.

CONCLUSION We are currently in the process of implementing a location management system based on these concepts of integrating joint models of space. It will be part of a service-oriented architecture for managing services provided by various devices in an indoor environment, to enable semantic composition and adaptation of these services to location-related information. The semantic modelling described here will be implemented in a separate layer to provide context -adaptive service composition capabilities.

While we've reviewed the situations where we might need to reason about context, we now analyze the mechanisms needed to perform the reasoning. This will help us identify which reasoning or inference engine we could use for implementing our location context management system.

REFERENCES

When reviewing the types of queries the localisation context management services has to fulfil, we roughly identify two types of queries:

[1] Tim Berners-Lee, James Hendler, Ora Lassila. “The Semantic Web”. Scientific American. 17 mai 2001 [2] KnowledgeWeb NoE, http://knowledgeweb.semanticweb.org [3] M.P. Papazoglou, D. Georgakopoulos, “Service-Oriented Computing”, Communications of the ACM, October 2003 [4] G Richard “Service advertisement and discovery, enabling universal device cooperation” IEEE Internet Computing, September-October 2000 [5] Jeffrey Hightower, Gaetano Borriello, “Location Sy stems for Ubiquitous Computing” , IEEE Computer, August 2001 [6] Mike Hazas, James Scott, John Krumm, “Location-Aware Computing comes to Age” , IEEE Computer, February 2004 [7] Gilles Privat, Thibaud Flury et al. Position-Based Interaction for Indoor Ambient Intelligence Environments. Proceedings of EUSAI 2003. 3-4 novembre 2003, Eindhoven Pays-Bas. [8] Gilles Privat, Thibaud Flury.” An infrastructure template for scalable location-based services”. Proceedings of SOC 2003, p214217. 15-17 mai 2003, Grenoble France. [9] Jeffrey Hightower. “From Position to Place”. Proceedings of the Workshop on Location Aware Computing (Ubicomp 2003), October 2003, Seattle, USA. [10] Dimitar Manov, Atanas Kiryakov, Kalina Bontcheva et al. «”Experiments with geographic knowledge for information extraction”. Proceedings of the Workshop on the Analysis of Geographic References (HLT-NAACL 2003). May -June 2003. [11] DAML+OIL (March 2001) Reference Description. [12] http://www.w3.org/TR/owl-ref, OWL (February 2004) Reference Description.

Queries inquiring about the truth of some properties. Exa mples are: "Is there a printer in room H250?", "Is printer X340 in building C?" Queries expecting an object to be returned. Examples are: "Where is John?", "Which room is printer X340?", "Which building is printer X340?" "Which (x,y,z) position is printer X340?", "Which printer is the closest to position (x=300, y=200, z=400)?", "Which rooms have a printer?", "Which printer is in room H250?", "Which rooms have both a printer and a telephone?" Handling such types of queries can be efficiently supported by a theorem prover or backward reasoning strategy. More specifically, a theorem prover organizes the control of the reasoning process this way: In trying to establish the truth of a property, it looks into the knowledge at hand, whether this property is explicitly there or not. If it is there, the property is true. If not, it attempts to derive it from knowledge at hand using a basic inference step. Eventually, if a promising inference step requires a property not explicitly stored, it tries to establish its truth (this is a recursive process).

6

57

Figure 5 : Semantic location concepts with domain-specific derivations

Figure 6 : Full example with relations between different models

7

58

Semantic User Profiles and their Applications in a Mobile Environment Andreas von Hessling Thomas Kleemann Alex Sinner Institut für Informatik Institut für Informatik Institut für Informatik Universität Koblenz-Landau Universität Koblenz-Landau Universität Koblenz-Landau D 56070 Koblenz D 56070 Koblenz D 56070 Koblenz [email protected] [email protected] [email protected] machine understandable and are also strongly related to Description Logics (DL) [4,5]. Therefore they can be processed by DL reasoners, like FaCT [12] or RACER [11].

ABSTRACT

In this paper, we propose a peer-to-peer based mobile environment consisting of stations providing semantic services and users with mobile devices which manage their owner’s semantic profile. In this environment the distributed computing power replaces centralized profile management. The service stations broadcast their services in a description logics based ontology language. When users equipped with their mobile device enter the range of a station, the semantic service is matched against the semantic user profile. The matching framework supports matching of services against user profiles, but is easily extended to matching services against other services or user profiles against other user profiles.

These two developments allow us to devise new applications which take into account the new opportunities offered by semantic information representation and the combination of computing power and ubiquitous availability. In this paper, we will turn our attention towards the concept of semantic user profiles and their possible applications in a mobile environment. In such an environment, places that offer services are equipped with bluetooth-enabled computers to broadcast their services to passing people with bluetoothenabled mobile devices which store their owner’s profile. These computers and mobile devices need not be connected to the Internet, all necessary communication between them is done via short-range wireless bluetooth connections. On page 2, we will discuss the mobile environment in detail.

Author Keywords

Mobile environment, peer-to-peer, user profiles, semantic modeling, matching, description logics, ontologies INTRODUCTION

The presented work is motivated by two aspects : the development of powerful mobile devices and the semantic web.

In this environment, there is no central mediator managing all the services and users, so interoperability aspects have to be considered. If we want a multitude of services and users to interoperate and communicate in a coherent way, we need a common language. In our case, this will be a description logics based ontology language similar to the languages that are used in the semantic web. Service descriptions and user profiles are encoded in this semantic language, which enables us to use the same user profile with a multitude of different services. This makes sense from two different viewpoints : first, it relieves us from having to create a new user profile from scratch (a non-trivial task) for every service provider, and second, the same interests in the profile can be relevant in a wide variety of services, possibly originating from many different providers

Current advances in the domain of mobile devices ranging from personal digital assistants (PDAs) and smart phones enable a new realm of applications. They have the computing power and storage capacity of last decade’s high performance computers, and yet are small enough to carry around with you all the time. Today, mobile phones are widespread consumer devices which can be operated painlessly by the majority of the population. It is a safe bet that in the near future, regular mobile phones will have even more computing power than todays smart phones. The development of the semantic web provided us with necessary tools to handle computer-understandable semantics. Semantic web pages are enriched with semantic annotations, usually encoded in some XML-flavored ontology language like DAML+OIL [13] or OWL [3]. These languages are

For illustration purposes, consider the following example : Bob is a person and owns a bluetooth-enabled PDA which manages his user profile. (See page 2) Bob is a devoted cinephile. Most of his free time, he goes to the movies or rents DVDs (no VCRs) at the local video store. Additionally, he is always eager to meet new people who share his hobby. In all of the above scenarios (movie theater, video store and community), Bob uses his PDA with one single semantic user profile to get recommendations for services he is likely

1

59

to use. The user profile manages facts about which kinds of movies he likes, but also allows for statements like ’I want to see epic movies only in the cinema’ or ’I am only interested in renting or buying DVDs, not VCRs’. Services, on the other hand, might state a list of movies to be shown (or rent), together with information about their classification, and their medium. If a service is matched positively against the interests stated in the user profile, the user will be notified about it and can decide whether to use it. The salient feature of semantic user profiles is that they can be used in a wide variety of completely different applications. The user carries his profile along with him and is independent of any specific service provider. Figure 1 visualizes the above scenario.

client can however also play the role of a service provider, for example when offering ’dating services’. So the environment is not hierarchically organized. Rather, it can be seen as a mobile wireless peer-to-peer network, even if the purpose of some ’peers’ is only to offer services (see also Figure 1). This topology has several advantages over other approaches requiring a central server: • Independence It is possible to set up access points for semantic services in the remotest parts of the world. No Internet connection is required. All we need is a mobile device with wireless connectivity and a semantic user profile and another device which offers semantic services over the same kind of wireless connectivity. • Cost Effectiveness Internet connections from mobile devices are still expensive. Since we are independent from the Internet and any service providers to use semantic services, these costs do not apply. • Privacy Management A major drawback of most centralized approaches is, that very personal data about many people is collected on a server. The concerned people have no control whatsoever about what exactly is stored and who has access to this data. In our approach all data about the user are stored on the owners mobile device. For most applications, all the computations using this data is done locally, so it is not necessary to send it to other peers. Should some applications require this, the user has the power not to allow it, or to disclose only the parts approved by her.

Figure 1. A scenario of related semantic services In the following, we describe the requirements and features for a semantic mobile environment before we elaborate the concepts of semantic services and semantic user profiles. A SEMANTIC MOBILE ENVIRONMENT The Big Picture

The vision of the semantic web, as described in [7], was dominated by the idea of semantic web agents. It was to create an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users.

• Scalability Since the management of all user profiles is done on their respective mobile devices, no central profile database is required. New mobile clients can be added to the system at will, without encountering any scalability issues.

Unfortunately, despite enormous efforts being done in the field, the semantic web has yet to happen. Up to now, the sheer amount of data on the world wide web and the lack of standard ontologies prevent the world wide web from evolving into the semantic web. Before these obstacles are overcome, the only viable solution is to develop small isolated semantic subnets.

On the other hand some challenges arise from the intended setting. The increase in hardware requirements on the mobile side restricts the use to modern devices. Only recent handheld devices and smartphones have enough computing and storage resources for on-device user profile management and reasoning support. We are however confident that the mobile industry development will take care of this issue by developing more powerful devices. Another issue is that most bluetooth devices allow only seven open connections at once. Should a mobile client get service offers from more than seven parties at once, this might cause problems. Due to the limited duration of the matching process this does not evoke severe limitations. And finally, no description logic reasoner is available (yet) for mobile devices. Since computing resources are very limited on mobile devices, no ofthe-shelf reasoner can be used. We are working on an implementation of a description logic reasoner for J2ME (Java 2 Mobile Edition) to overcome this situation.

Fortunately, in this work we will not deal with a semantic web approach as described above, even if there is a close relationship. Instead of the semantic web, we will consider the ’real’ world, where people equipped with high-end mobile devices roam freely in an ubiquitously computerized environment. There is no need for central servers or agent technologies, since people with their mobile devices take on the role of semantic web agents. They go to places like video stores, movie theaters, railway stations, etc., where they can use the semantic services offered. System Topology

Our mobile environment mainly consists of service providers and consumers with mobile clients who use them. Every 2

60

In the following, we will describe more into detail the concepts of semantic services and semantic user profiles. Before we can do so, we have first to give a language to express semantics.

to extend the definition with descriptions about the service provider and maybe service modalities, but in the scope of this paper we will stick to this simple definition. Together with the vocabulary ontology (Figure 2), we are able to express complex services for movie theaters, video stores and persons (in the community scenario) such as:

Description Logics as a Semantic Language

The idea of using Description Logics as a semantic language is not new [5]. In this paper we use the expressiveness of the DL ALCHI, which is the standard DL ALC extended with role hierarchies and inverse roles. Syntax and semantics of ALCHI are amply described in [4].

V ideoStoreService1 ≡ ∃of f ers.( M ovie u ∃hasGenre.SciF i u ∃hasM edium.V CR) CinemaService1 ≡ ∃of f ers.( M ovie u ∃hasGenre.Epic u ∃hasM edium.CinemaScreen) BobService1 ≡ ∃demands.( M ovie u ∃hasGenre.F antasy)

In our mobile environment, we consider semantic services, semantic user profiles and concepts to which the services and profiles refer. The ontology about the concepts in the real world can be considered as a general vocabulary and is independent from the system itself, while the service and user profile ontologies describe specific capabilites of their respective components. We expect all participants of the service to incorporate the shared parts of the ontology. Figure 2 shows a sample ontology for the movies example in DL notation. T hing M ovie Genre M edium DV D V CR CinemaScreen Epic F antasy SciF i

v v v v v v v v v v

The first service from the video store offers a SciFi Movie on a VCR tape. The second service from the cinema offers an epic movie shown on a cinema screen. Bob’s service asks for fantasy movies.

> T hing T hing T hing M edium M edium M edium Genre Genre Genre

The services are however of no use without their counterpart, a semantic user profile, which we will now describe. Semantic User Profiles

A semantic user profile is a description of a user’s interests and disinterests:

Figure 2. A simple movie ontology P rof ile ≡ This simplistic movie ontology provides us with the necessary vocabulary to express more complex concepts like e.g.

d di ∃hasInterest.Interesti j ∀hasInterest.(¬Disinterestj )

This definition states that the user is interested in all concepts covered by the Interesti concepts without the concepts covered by Disinteresti . This profile description is quite similar to the one found in [8], but we do without encoding numeric levels of interest. Even though a user profile might contain additional information about his identity and possessions or capabilities, we will focus solely on an anonymous semantic description of her long term interests.

RentalM ovie ≡ M ovie u∃hasM edium.(DV D t V CR) which describes a Movie that is either on a DVD or a VCR. Semantic Services

A semantic service is of the following form: Service ≡ ∃provides.T hing 1 Basically, a semantic service is a description of what is provided. Using different subroles of provides we can also specify types of services. In the scope of this paper, we restrict ourselves to defining two subroles, of f ers and demands. The semantics of these subroles is defined through the matching of services and profiles, but for now we can say that demands allows mainly users to define short-term interests they want to advertise, while of f ers is the standard way of providing services. It is possible

For the semantic services and profiles to work together, it is indispensable that the Things offered in the service definition use the same ontology (or vocabulary) as the interests from the profile definition. Figure 3 represents a sample profile of Bob stating that he likes epic movies shown in a cinema but does not like movies on VCR. Semantic Matching

Having defined our notions of services and profiles we now introduce our approach to semantic matching. The goal of semantic matching is to determine whether a given profile is semantically compatible to a particular service and, if so, how well both do match. Ontologically speaking, the con-

1

For a correct DL semantics, we have to additionally restrict the service provided to exactly what is provided with a value restriction: Service ≡ ∃provides.T hing u ∀provides.T hing. To enhance readability we omit these value restrictions. The same is valid later on for the formalization of semantic user profiles

3

61

services of restaurant agents on a mere semantic basis. In particular, the Agents2Go methodology takes an decentralized approach based on mobile devices which use Internetconnectivity to connect to broker agents [10].

Bob ≡ ∃hasInterest.( M ovie u ∃hasGenre.Epic u ∃hasM edium.CinemaScreen) u∀hasInterest.( ¬M ovie t ¬∃hasM edium.V CR)

The Agents2Go approach’s differences to our system lie in the distinction between service providers (the restaurant agents) and the service requesters. Opposed to this, our application allows for the merging of the service providing and service exploiting functionality.

Figure 3. Bob’s profile cepts describing a user’s interests and dislikes have to be compared to the concepts offered by a service2 :

General Differences

The main differences to our approach have already be pointed out on page 2. By avoiding central servers and agent technologies, we have created a simple yet powerful system, which has many advantages over conventional approaches, namely better privacy management by avoiding to store all profiles on a single server and a cheaper infrastructure by avoiding Internet connections, without giving up the advantages of user profiles and semantic interoperability.

U sersInterest ≡ ∃hasInterest−1 .P rof ile ServiceOf f er ≡ ∃of f ers−1 .Service Based on this, a concept Match is expressed as the intersection of these two concepts.

FUTURE WORK

M atch ≡ U sersInterest u ServiceOf f er

A prototype of this work has been realized [15]. In this prototype, only emulated mobile devices were used and the semantic matching was performed by making a connection to a RACER [11] server using the DIG interface [6].

If M atch is empty, the user is not interested in the service. Otherwise, we can check the subsumption relationships between M atch, U sersInterest and ServiceOf f er to determine a match degree. Horrocks et al. have proposed a matching approach that achieves this task [14].

To realize the system on real mobile devices, we are currently working on an implementation of a DL reasoner for the J2ME platform. Another challenge is the automatic management of the user profile. Currently, facts about interests have to be added manually to the profile. This will be obsolete, once the profile management is able to evaluate user feedback on services offered. Other work areas involve the development and adaptation of industrial ontologies to be used for service and profile descriptions. Finally, we plan to evaluate the system by deploying some service stations in cooperation with industry partners and local authorities.

RELATED WORK

In the following, we want to quickly describe some related approaches for semantic matchmaking of services. DReggie

DReggie [1] is a dynamic service discovery infrastructure targeted at mobile commerce applications that exploits semantic matching using the XML-based DAML (DARPA Agent Markup Language) to describe services. A DReggie Lookup Server to which DReggie Clients submit their services performs the matching process and returns information about matches back to the clients [9, pp. 3-4].

ACKNOWLEDGMENTS

This work is partially funded by the ”Stiftung RheinlandPfalz für Innovation”.

The DReggie matching approach mainly differs from our system by taking into account services attribute priorities which provide a means for the prioritization of different aspects of a service. In addition to that, it contains an onlinevalidation process for services received by the Lookup Server, which performs checks for semantically ill-formed expressions.

REFERENCES

1. http://ebiquity.umbc.edu/v2.1/project/html/id/47/, retrieved on 24th November 2003. 2. http://www.csee.umbc.edu/õratsi2/myresearch/agentsToGo/, retrieved on 21th November 2003. 3. Grigoris Antoniou and Frank van Harmelen. Web ontology language: Owl. In S. Staab and R. Studer, editors, Handbook on Ontologies in Information Systems. Springer-Verlag, 2003.

Agents2Go

The Agents2Go project [2] delivers an agent-based dynamic service discovery and information retrieval system using broker agents. In its application domain, the making of recommendations concerning restaurants, a broker agent mediates the ontology-based queries of personal agents and the

4. Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Peter F. Patel-Schneider. The description logic handbook: theory, implementation, and applications. Cambridge University Press, 2003.

2

We note that we also allow matching between offering and demanding services, but for simplicity’s sake we only cover matching of offering services agains user profiles

4

62

5. Franz Baader, Ian Horrocks, and Ulrike Sattler. Description logics as ontology languages for the semantic web.

10. Harry Chen, Anupam Joshi, and Timothy W. Finin. Dynamic service discovery for mobile computing: Intelligent agents meet jini in the aether. Cluster Computing, 4(4):pp. 343–354, 2001.

6. Sean Bechhofer. The DIG description logic interface: DIG/1.1. Technical report, University of Manchester, February 2003.

11. Volker Haarslev and Ralf Möller. RACER system description. Lecture Notes in Computer Science, 2083:701–706, 2001.

7. Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic Web. Scientific American, 284(5):34–43, May 2001.

12. Ian Horrocks. The FaCT system. Lecture Notes in Computer Science, 1397:307–312, 1998.

8. Andrea Cal`ı, Diego Calvanese, Simona Colucci, Tommaso Di Noia, and Francesco M. Donini. A description logic based approach for matching user profiles. In Proc. of the 2004 Description Logic Workshop (DL 2004), 2004. To appear.

13. Ian Horrocks. DAML+OIL: A reason-able web ontology language. In Extending Database Technology, pages 2–13, 2002. 14. Lei Li and Ian Horrocks. A software framework for matchmaking based on semantic web technology.

9. D. Chakraborty, F. Perich, S. Avancha, and A. Joshi. Dreggie: Semantic service discovery for m-commerce applications. In 20th Symposiom on Reliable Distributed Systems, 2001.

15. Andreas von Hessling. Ontology based profile matching on mobile devices, February 2004. bachelor’s thesis.

5

63

Supporting Personalized Interaction in Public Spaces Giovanni Cozzolongo, Berardina De Carolis, Sebastiano Pizzutilo Intelligent Interfaces Department of Informatics University of Bari {cozzolongo, decarolis, pizzutilo}@di.uniba.it In this paper, we present a distributed architecture supporting personalization with information services available in public environment. In particular, we investigate how a personal User Modeling Agent (UMA) has been designed and implemented for achieving personalized interaction in case the user interacts with the environment using his/her device.

Abstract In this paper, we describe an architecture supporting personalized interaction with information services available in public spaces. In particular, we focus the presentation on the personal user modeling agent that runs on the user mobile device and is able to exchange data with the environment in order to get personalized information.

This agent has to model the user behaviour and to transfer to the Environment Agent what can be inferred about the user in that domain according to required service. Obviously, these two entities have to understand each other; at this aim the UMA has to transfer information in a xml-based language compliant to some ontologies. To this aim our Agent uses a simplification of UbisWorld language and ontologies [20].

Keywords User modeling agents, embodied conversational agent. INTRODUCTION Interaction with information services available in public spaces may be provided in different ways. Usually, it can be accessed using public kiosks, LCD displays or, if the user has his/her personal device, this can be used to interact with the environment in a private way. However, this does not mean that the interaction is personalized in order to meet the requirements and interests of the user. Then, the physical space has to be designed so as to ‘sense’ the particular users being in the environment and to ‘know’ their interests and preferences, the environment can then use this information to create more specifically targeted presentations [4, 8].

As far as interaction is concerned, it may happen in two ways: i) in presence of environment information display facilities (kiosk, lcd display, etc.), the user may decide to interact directly with environment devices, or ii) using his/her personal device. In the first case, we use an Embodied Conversational Agent (ECA) as interaction metaphor for representing environment information service [2] and, in this case, the personal device is used in a transparent way in order to provide user modeling services to the environment. ECAs are software agents with a more or less realistic and ‘human-like’ embodiment, which exhibit many of the properties of humans in face-to-face conversation, including the ability to provide a natural interaction and enforce social involvement [1, 10]. They provide a type of multimodal interface where the agent can dialogue with the user by employing verbal and non-verbal expressions showing human-like qualities such as personality and emotions [1, 16].

With this aim, there are two possible approaches to user modeling: in the first one, this is one of the tasks of the personalization component of the smart environment, in the second one, the user has his/her user model on the personal device. In this case, from the architectural point of view, it may be implemented as a “user-trusted” server and may be replicated at the beginning of each interaction with an environment, or, it may stay with the user all the time, either on his or her personal device or on a gadget that the user always wears [14,9]. If this approach is considered, interaction may become personal even in a public space. Users may move around with their own personal device: when in proximity of an information access point a service discovery protocol can be activated, user-related information may be transferred to the environment and, then, requested information may be adapted to that user, who can handle interaction through her/his device or through a public kiosk that is connected to the personal device for user modeling purposes [3].

In the second case, the information is presented and organized on the user personal device according to a predefined interface structure whose content is generated dynamically. In a first version of the system we adopted a mixed approach where the information was partially shown on the user device and partially in the public kiosk leaving however the interaction control on the PDA [3]. After an informal evaluation we discovered that this approach was distracting and sometimes confusing the user since he/she had to follow the ECA presentation and then look on the personal device in order to decide what to do in the

64

outlined in the last Section.

following interaction move. However, a further evaluation study has been planned in order to test and compare the two approaches.

SYSTEM ARCHITECTURE The functional architecture of the system is illustrated in Figure 1. It includes two main components: the Environment Agent, that handles access to service information and public interaction, and the Personal Device Agent running on the user computing device, which handles user modeling and personal interaction. Communication between the two Agents are performed using communication protocols exchanging ACL compliant messages whose content is expressed according to XML-based languages [13].

The paper is structured as follows: after a brief description of the system architecture, we will explain how the Environment Agent and the UMA are designed and implemented by showing an example of its application in a tourist information point. This is just an example domain used to test our approach; the presented architecture has been designed in a domain independent way and can be used to support the interaction with different types of environments. Obviously, domaindependent knowledge bases have to be implemented according to the environment scope. Conclusions are

Let’s see in more details how these two components work. AIS & Dialog Rules

User info KBs

User Modeling Agent (UMA)

Domain-BN

Personal Interface

User-Tasks

User Move

Agent Model Domain Model

Environment Agent

Env Move + Possible Interaction Moves

Dialogue Strategies Personalization Strategies

Result Generator

Personal Device Agent

Env Model BODY WRAPPER APML Agent Move MS-Agent

Festival + Greta

Haptek

User Interaction Move

Figure1: Functional Architecture of the Distributed System physical world to which the Agent may refer), ii) the agent model (goals and relations among them, personalization strategies), iii) the user-related information (acquired by the User Modeling Agent during the interaction) and iv) the interaction history.

The Environment Agent The Architecture of the Environment Agent is based on the model of a Natural Language Generation (NLG) system [18]. Given a set of goals to be achieved in the selected domain (tourist information service in this case), the Agent plans what to communicate to the user and decide how to render it according to the expressive capabilities of its “body” and to the conversational context. In this view, the Agent is seen as an entity including two main components (a ‘Mind’ and a ‘Body’) which are interfaced by a common I/O language, so as to overcome integration problems and to allow their independence and modularity [6].

Let’s make the hypothesis that the system is running in a Southern Italy Tourist Information Point providing information about the region. Initially, according to the location and the environment settings, the initial environment and agent models are triggered (Table1). The Environment model stores information about the type of interaction space, the list of objects and places that can be referenced within the digital space or the real one. This knowledge about the environment enables the agent to use, for instance, deixis to refer to objects mentioned in the interaction.

Then, in a given phase of the interaction, given a goal, the environment agent selects a suitable plan in a library of plan recipes relative to that domain. The selection is based on the current Agent Internal State (AIS) that stores the following information: i) the environment model (information about the application domain, type of interaction, objects or points of interest in the digital or

If the interaction happens through an ECA displayed in a public kiosk then the Agent’s physical and behavioural features are set to meet the requirements of the expected

65

target users (adults with interest in visiting the region, knowing about suggested itineraries, local attractions, food and so on) and of the cultural behaviour which is typical of the agency location [7].

the tree whose focus matches the user’s preference for that category of service. In case user preferences are not accessible or available, the system will present all the possible alternatives.

These features influence the agent embodiment and behaviour. We selected a dark-haired female face showing a “warm” friendly behaviour. The role of a travel agent does not require a strong empathic attitude; she has rather to establish a social relation with the user by showing that it understands the user feedback. Table1: an example of AIS in the Tourist Information domain

When the presentation plan that satisfies the current constraints is selected, information to be presented is organized accordingly. The produced result is specified using an extension of APML (Affective Presentation Markup Language: [5]). This resulting string combines the text with tags which specify the ‘meanings’ that the Body will have to attach to each part of it. These meanings include the communicative functions that are typically used in human-human dialogs (emphasis, affective, meta-cognitive, performative, deictic, adjectival and belief relation functions) [15] and the background relative to the information being conveyed.

Environment Features Type Interaction Space Space References

Touristic Information Point Public (infopanel,coord1), (front desk,coord2), …

Agent Features Role Personality Gender Culture Emphatic Attitude

An example of APML for the Inform(PublicTransportation(touristic-place)) leave of the plan in Figure2 is shown in the following table:

Travel Agent Friendly Female Southern Italy Medium

Table2: an example of extended APML string

Default Agent Goals

Taxi Information Rent Car other bye bus-timetable Public Transportation in Bari: buses runs approximately every 30 minutes.. They are always late More Information can be obtained by calling to 800600800! "

Describe(Role(Agent)) Present(touristic-facilities) ∀x | x = Touristic-place: Suggest(x),DescribeinGeneral(x), Describe(SelectedItineraries(x)),Describe(Art&Culture(x)), Describe(Nature(x)), Describe(Accomodation(x))

The agent’s domain-knowledge (touristic info in this case) is directly related to the environment in which interaction occurs. As a consequence, corresponding default goals and communicative plans of the agent are activated as shown in the last section of Table1: first of all, the agent introduces herself; then, presents tourist facilities etc. During interaction, the agent’s goal can be updated according to explicit users requests. Present(Touristic-Facilities) ElabGenSpec Inform(List(Available-Services))

Once the APML string has been generated, the expressed meanings can be rendered according to the interaction modality.

Describe(Available-Services)

Sequence Describe(Where-to-eat)

Describe(Accomodation) Describe(Transportation)

Pref(User) Inform(Elegant-Restaurants)

…. Inform(Cheap-Restaurants)

In case the user is interacting with the ECA, the string will be interpreted by the Body Wrapper module in the following way (Figure 3):

Preg(User) Inform(Taxi)

Inform(Public)

• the tag specifies the agent’s background information, this can be rendered as a set of buttons, a web page, a table, and so on, according to the specified type. In the previous example, the tag of the type “ask-for” will be rendered as a set of mutually exclusive clickable buttons and a table showing the bus timetable. • the tag specifies the main message, comment or explanation about the background information that will be delivered by the agent. In this phase the Body Wrapper decides which combination of body signals

Inform(RentCar)

Figure 2: Example of Discourse Plan Figure 2, illustrate the structure of a plan for Presenting Tourist Facilities, which is stored in DPML (Discourse Plan Markup Language) [5]. It first presents general information about the available services, then it describes interactively in details each service. The presentation can be adapted to user preferences by showing the branch of

66

(verbal and non verbal behaviours) to use to convey every meaning specified in the APML move. In this way, the agent move, tagged at the meaning level, can be coupled with different bodies using a meaningsignal table [6]. So far, we developed APML-wrappers for Greta [15], Haptek [12] and MS-Agent technology.

Hyper/Textual information Possibile user moves

Figure 4: an example of interface on the personal device generated from the APML string in Table 3. The Personal Device Agent This component enables the user to personalize the interaction with the environment agent. It includes two sub-components, aimed at modeling the user and, as shown previously, at providing an interface for exchanging messages (user and agent moves) with the environment. As far as user modeling is concerned, possible approaches to these problems are represented by a centralized, a distributed or a mobile approach [14]. All these approaches present advantages and disadvantages. In traditional client-server information systems, the most frequent design choice is to store the User Model on the server side, by enabling the user to access her/his model after having been recognized by the system. In the distributed solution, user information are stored in different servers, reducing in this way the computational load but presenting problems of redundancy/incompleteness of user information and consistency. In the mobile approach the user "brings" always with her/himself the user model, for instance on an handheld device, and, when the interaction with an environment starts, her/his profile is passed to the environment user modeling component; it seems to be very promising since it presents several advantages. The information about the user are always available and updated, and can be accessed in a wireless and quite transparent way, avoiding problems related to consistency of the model, since there is always one single profile per user.

Figure 3: an example of interaction generated from the APML fragment in Table 2. Table3: another example of extended APML string Special Offers Itinerary Restaurants Transportation bye Hi!I’m Maria! . I’m here to provide you tourist information about this region of Puglia! "

If the user is interacting through his/her device, the APML move will be rendered, according to the structure of the personal interface(Figure 4) in the following way:

The architecture of User Modeling Agent (UMA) is based on mobile approach in the following sense:

• tags specified within the that can be used to ask for information will be rendered as a set of clickable buttons (i.e. tag of the type “ask”). • the tag specifies the main message that will be shown in the hyper/textual window of the personal interface together with focus information eventually specified in the background. This transformation is performed by the Personal Device Agent by applying XSL rules to the APML string.

A Personal Device is used mainly in situations of user mobility. Normally, when the user is in more “stable” environments (i.e. home, office, etc.) he/she will use other devices belonging to that environment (i.e. PC, house appliances, etc.). In this view, the Personal Device can be seen as a satellite of other “nucleus” devices that the user uses habitually in his/her daily life. Then, the UMA has to be able to handle this nucleus-satellite relation. With this aim, instead of implementing a agent, the UMA is cloned and lives on platforms/devices. However, although approach simplifies the implementation,

67

truly mobile all the user the chosen it requires

When the UMA detects a situation that could require the use a particular portion of user model it transfers the correspondent network on the personal device. In the current version of the system a transfer is activated according to the user to do list [4], or on user request or, in presence of a network connection, it can be done according to the scope of the environment in which the user is interacting and the user task (i.e. required service in our example). For instance, in the current example, the agent could transfer, on the personal device, the portion of the user model concerning “holidays” since the user has this entry in the to-do list or when he/she starts the interaction with the tourist information service.

transferring knowledge needed for user modelling and opens consistency problems in maintaining a global image of user preferences, interests, habitual behaviour, and so on. In our approach, user models are organized in a hierarchy [19] whose nodes represent relevant interaction environments, task families, interest groups. Each entitiy in the hierarchy represent a subset of user model data relevant to the specified domain, task, etc. As far as the modeling strategy is concerned, the UMA employs a hierarchical approach to Belief Network (HBN) [11] organized as in Figure 5.

…

Identity

…

HOME

WORK

Budget

FreeTime

Holiday

As far as interaction with an active environment is concerned, when the user approaches one of the active points in the environment, the UMA, using the appropriate user modelling portion, provides information about the user preferences that are relevant for that particular domain.

Sport

TV

These preferences are transferred as XML-annotated user info. The environment can then make its own reasoning about these preferences and adapt interaction accordingly. However, information inferred by the agent has to be passed to the environment in a “understandable” way.

…..

Figure 5: an example of User Model HBN

A solution to this problem is to make reference to an ontology so that the Environment can give the right semantic interpretation to user data. In our system we transform inferred information in situational statements [20]; this language is able to integrate user features and data with situational statements and privacy settings in order to support ubiquitous interaction.

The roots of the hierarchy represents user modelling scopes (interaction environments). Nodes in lower levels of the HBN models specific subset of user data. Dotted lines represent hierachical dependencies while arrows represent a causal link according to [11]. Each node represent a Belief Network (BN) aiming at modelling a default behaviour concerning that domain or task. Each network has a representation of the context as nodes that receive an evidence when the user is interacting in that situation. Figure 6 represents a BN aiming at modelling user preferences and interests when on Holiday.

During the interaction, the UMA sends to the environment a XML string, representing situational statements relevant for that domain and interaction context. A situational statement has the following form:

In this prototype, in order to test the suitability of our approach and to simplify the propagation algorithm, we implemented a simple model in which a causal link connecting two macro-nodes represent a relation between a leaf node (origin of the causal link) and a root node (destination of the casual link). The budget node, in the BN in Figure 6, is an example of this situation.

90

List of Publications (# ::= out of print) electronic copies can be found at: http://w5.cs.uni-sb.de/Publist/ or ftp://ftp.cs.uni-sb.de/pub/papers/SFB378/ copies can be ordered from: Doris Borchers, Universität des Saarlandes, FR 6.2: Department of Computer Science, Postfach 151 150, Im Stadtwald 15, D-66041 Saarbrucken, ¨ Fed. Rep. of Germany, e-mail: [email protected]

Reports B1 Schirra, J., Brach, U., Wahlster, W., Woll, W.: WILIE — Ein wissensbasiertes Literaturerfassungssystem. FB Informatik, KI-Labor, Bericht Nr. 1, April 1985. In: Endres-Niggemeyer, B., Krause, J. (eds.): Sprachverarbeitung in Information und Dokumentation. Heidelberg: Springer, 1985, 101–112. # B2 Arz, J.: TRICON — Ein System fur ¨ geometrische Konstruktionen mit naturlichsprachlicher ¨ Eingabe. FB Informatik, KI-Labor, Bericht Nr. 2, Mai 1985. In: Endres-Niggemeyer, B., Krause, J. (eds.): Sprachverarbeitung in Information und Dokumentation. Heidelberg: Springer, 1985, 113–123. B3 Wahlster, W., Kobsa, A.: Dialog-Based User Models. SFB 314 (XTRA), Bericht Nr. 3, Februar 1986. In: Ferrari, G. (ed.): Proceedings of the IEEE 74 (7). July 1986 (Special Issue On Language Processing), 984–960. #

B14 Jansen-Winkeln, R. M.: LEGAS — Inductive Learning of Grammatical Structures. FB Informatik, KI-Labor, Bericht Nr. 14, November 1986. In: Hallam, J., Mellish, C. (eds.): Advances in Artificial Intelligence. Proceedings of the AISB Conference. Chichester: Wiley, 1987, 169–181. #

¨ approB15 Werner, M.: RMSAI — Ein Reason Maintenance System fur ximative Inferenzen. FB Informatik, KI-Labor, Bericht Nr. 15, Dezember 1986. In: Stoyan, H. (ed.): Begrundungsverwaltung. ¨ Proceedings. Berlin etc.: Springer, 1988, 86–110. # B16 Schmauks, D.: Natural and Simulated Pointing — An Interdisciplinary Survey. SFB 314 (XTRA), Bericht Nr. 16, Dezember 1986. In: 3rd European ACL Conference, Kopenhagen, Danmark 1987. Proceedings. 179–185. B17 Zimmermann, G., Sung, C. K., Bosch, G., Schirra, J.R.J.: From Image Sequences to Natural Language: Description of Moving Objects. SFB 314 (VITRA), Bericht Nr.17, Januar 1987.# ´ E., Rist, T., Herzog,: G. Generierung naturlichsprachli¨ B18 Andre, ¨ cher Außerungen zur simultanen Beschreibung von zeitveränderlichen Szenen. SFB 314 (VITRA), Bericht Nr. 18, April 1987. In: Morik, K. (ed.): GWAI-87. 11th German Workshop on Artificial Intelligence. Proceedings. Berlin/Heidelberg: Springer, 1987, 330–338. # ´ E.: Ereignismodellierung zur inkreB19 Rist, T., Herzog, G., Andre, mentellen High-level Bildfolgenanalyse. SFB 314 (VITRA), Bericht Nr. 19, April 1987. In: Buchberger, E., Retti, J. (eds.): 3. ¨ Osterreichische Artificial-Intelligence-Tagung. Proceedings. Berlin/Heidelberg: Springer, 1987, 1–11. #

B4 Fendler, M., Wichlacz, R.: SYCON — Ein Rahmensystem zur Constraint-Propagierung auf Netzwerken von beliebigen symbolischen Constraints. FB Informatik, KI-Labor, Bericht Nr. 4, November 1985. In: Stoyan, H. (ed.): GWAI-85. 9th German Workshop on Artificial Intelligence. Proceedings. Heidelberg: Springer 1985, 36–45.

B20 Beiche, H.-P.: LST-1. Ein wissensbasiertes System zur Durchfuh¨ rung und Berechnung des Lohnsteuerjahresausgleichs. SFB 314 (XTRA), Bericht Nr. 20, April 1987. In: Buchberger, E., Retti, J. ¨ (eds.): 3. Osterreichische Artificial-Intelligence-Tagung. Proceedings. Berlin/Heidelberg: Springer, 1987, 92–103. #

B5 Kemke, C.: Entwurf eines aktiven, wissensbasierten Hilfesystems fur ¨ SINIX. FB Informatik, KI-Labor (SC-Projekt), Bericht Nr. 5, Dezember 1985. Erweiterte Fassung von: SC — Ein intelligentes Hilfesystem fur ¨ SINIX. In: LDV-Forum. Nr. 2, Dezember 1985, 43–60. #

B21 Hecking, M.: How to Use Plan Recognition to Improve the Abilities of the Intelligent Help System SINIX Consultant. FB Informatik, KI-Labor (SC-Projekt), Bericht Nr. 21, September 1987. In: Bullinger, H.-J., B. Shackel: Human–Computer Interaction — INTERACT 87. Proceedings of the 2nd IFIP Conference on Human– Computer Interaction. Amsterdam: North Holland, 1987, 657–662.

B6 Wahlster, W.: The Role of Natural Language in Advanced Knowledge-Based Systems. SFB 314 (XTRA), Bericht Nr. 6, Januar 1986. In: Winter, H. (ed.): Artificial Intelligence and Man-Machine Systems. Heidelberg: Springer, 1986, 62–83. B7 Kobsa, A., Allgayer, J., Reddig, C., Reithinger, N., Schmauks, D., Harbusch, K., Wahlster, W.: Combining Deictic Gestures and Natural Language for Referent Identification. SFB 314 (XTRA), Bericht Nr. 7, April 1986. In: COLING ’86. 11th International Conference on Computational Linguistics. Proceedings. Bonn 1986, 356– 361. B8 Allgayer, J.: Eine Graphikkomponente zur Integration von Zeigehandlungen in naturlichsprachliche ¨ KI-Systeme. SFB 314 (XTRA), Bericht Nr. 8, Mai 1986. In: GI – 16. Jahrestagung. Proceedings, Bd. 1. Berlin 1986, 284–298. ´ E., Bosch, G., Herzog, G., Rist, T.: Coping with the Intrinsic B9 Andre, and Deictic Uses of Spatial Prepositions. SFB 314 (VITRA), Bericht Nr. 9, Juli 1986. In: Jorrand, Ph., Sgurev, V. (eds.): Artificial Intelligence II. Proceedings of AIMSA-86. Amsterdam: North-Holland, 1987, 375–382. # B10 Schmauks, D.: Form und Funktion von Zeigegesten, Ein interdiszi¨ plinärer Uberblick. SFB 314 (XTRA), Bericht Nr. 10, Oktober 1986. B11 Kemke, C.: The SINIX Consultant — Requirements, Design, and Implementation of an Intelligent Help System for a UNIX Derivative. FB Informatik, KI-Labor (SC-Projekt), Bericht Nr. 11, Oktober 1986. In: User Interfaces. Proceedings of the International Conference of the Gottlieb Duttweiler Institut. Ruschlikon/Z ¨ urich ¨ 1986. B12 Allgayer, J., Reddig, C.: Processing Descriptions containing Words and Gestures. SFB 314 (XTRA), Bericht Nr. 12, Oktober 1986. In: ¨ Rollinger, C.-R., Horn, W. (eds.): GWAI-86 und 2. Osterreichische Artificial-Intelligence-Tagung. Proceedings. Berlin/Heidelberg: Springer, 1986, 119–130. # B13 Reithinger, N.: Generating Referring Expressions and Pointing Gestures. SFB 314 (XTRA), Bericht Nr. 13, November 1986. In: Kempen, G. (ed.): Natural Language Generation. Dordrecht: Nijhoff, 1987, 71–81. #

B22 Kemke, C.: Representation of Domain Knowledge in an Intelligent Help System. FB Informatik, KI-Labor (SC-Projekt), Bericht Nr. 22, September 1987. In: Bullinger, H.-J., B. Shackel: Human– Computer Interaction — INTERACT 87. Proceedings of the 2nd IFIP Conference on Human–Computer Interaction. Amsterdam: North Holland, 1987, 215–220. # B23 Reithinger, N.: Ein erster Blick auf POPEL. Wie wird was gesagt? SFB 314 (XTRA), Bericht Nr. 23, Oktober 1987. In: Morik, K. (ed.): GWAI-87. 11th German Workshop on Artificial Intelligence. Proceedings. Berlin/Heidelberg: Springer 1987, 315–319. B24 Kemke, C.: Modelling Neural Networks by Means of Networks of Finite Automata. FB Informatik, KI-Labor, Bericht Nr. 24, Oktober 1987. In: IEEE, First International Conference on Neural Networks, San Diego, USA 1987. Proceedings. B25 Wahlster, W.: Ein Wort sagt mehr als 1000 Bilder. Zur automatischen Verbalisierung der Ergebnisse von Bildfolgenanalysesystemen. SFB 314 (VITRA), Bericht Nr. 25, Dezember 1987. In: Annales. Forschungsmagazin der Universität des Saarlandes, 1.1, 1987, S.82–93. Wahlster, W.: One Word Says More Than a Thousand Pictures. On the Automatic Verbalization of the Results of Image Sequence Analysis Systems. SFB 314 (VITRA), Bericht Nr. 25, Februar 1988. In: T.A. Informations, Special Issue: Linguistique et Informatique ´ ´ en Republique Federale Allemande, September 1988. B26 Schirra, J.R.J., Bosch, G., Sung, C.K., Zimmermann, G.: From Image Sequences to Natural Language: A First Step towards Automatic Perception and Description of Motions. SFB 314 (VITRA), Bericht Nr. 26, Dezember 1987. In: Applied Artificial Intelligence, 1, 1987, 287–305. # B27 Kobsa, A., Wahlster, W. (eds.): The Relationship between User Models and Discourse Models: Two Position Papers. SFB 314 (XTRA), Bericht Nr. 27, Dezember 1987. Both papers of this report appear in: Computational Linguistics 14(3), Special Issue on User Modeling (Kobsa, A., Wahlster, W. (eds.)), 1988, 91–94 (Kobsa) and 101– 103 (Wahlster).

B28 Kobsa, A.: A Taxonomy of Beliefs and Goals for User Models in Dialog Systems. SFB 314 (XTRA), Bericht Nr. 28, Dezember 1987. In: Kobsa, A., Wahlster, W. (eds.): User models in dialog systems. Berlin etc.: Springer, 1988, 52–68.

B44 Kemke, C.: Representing Neural Network Models by Finite Automata. FB Informatik, KI-Labor, Bericht Nr. 44, August 1988. In: Proceedings of the 1st European Conference on Neural Networks “nEuro’88”, Paris 1988. #

B29 Schmauks, D., Reithinger, N.: Generating Multimodal Output — Conditions, Advantages and Problems. SFB 314 (XTRA), Bericht Nr. 29, Januar 1988. In: COLING-88, Budapest 1988. Proceedings. 584–588.

B45 Reddig, C.: “3D” in NLP: Determiners, Descriptions, and the Dialog Memory in the XTRA Project. SFB 314 (XTRA), Bericht Nr. 45, August 1988. In: Hoeppner, W. (ed.): Kunstliche ¨ Intelligenz. GWAI-88. 12th German Workshop on Artificial Intelligence. Proceedings. Berlin: Springer, 1988, 159–168. #

B30 Wahlster, W., Kobsa, A.: User Models in Dialog Systems. SFB 314 (XTRA), Bericht Nr. 30, Januar 1988. In: Kobsa, A., Wahlster, W. (eds.): User Models in Dialog Systems. Berlin: Springer, 1988, 4– 34. (Extended and revised version of Report No. 3.) ´ E., Herzog, G., Rist, T.: On the Simultaneous InterpretatiB31 Andre, on and Natural Language Description of Real World Image Sequences: The System SOCCER. SFB 314 (VITRA), Bericht Nr. 31, April 1988. In: ECAI-88. Proceedings. London: Pitman, 1988, 449– 454. #

¨ KIB32 Kemke, C.: Der Neuere Konnektionismus. Ein Uberblick. Labor, Bericht Nr. 32, Mai 1988. In: Informatik-Spektrum, 11.3, Juni 1988, 143–162. # B33 Retz-Schmidt, G.: Various Views on Spatial Prepositions. SFB 314 (VITRA), Bericht Nr. 33, Mai 1988. In: AI Magazine, 9.2, 1988, 95– 105. B34 Ripplinger, B., Kobsa, A.: PLUG: Benutzerfuhrung ¨ auf Basis einer dynamisch veränderlichen Zielhierarchie. SFB 314 (XTRA), Bericht Nr. 34, Mai 1988. In: Hoeppner, W. (ed.): Kunstliche ¨ Intelligenz. GWAI-88. 12th German Workshop on Artificial Intelligence. Proceedings. Berlin: Springer, 1988, 236–245. B35 Schirra, J.R.J.: Deklarative Programme in einem Aktor-System: MEGA-ACT. FB Informatik, KI-Labor, Bericht Nr. 35, Mai 1988. In: KI, 2.3 / 2.4, 1988, 4–9/4–12. B36 Retz-Schmidt, G.: A REPLAI of SOCCER: Recognizing Intentions in the Domain of Soccer Games. SFB 314 (VITRA), Bericht Nr. 36, Juni 1988. In: ECAI-88. Proceedings. London: Pitman, 1988, 455457. # B37 Wahlster, W.: User and Discourse Models for Multimodal Communication. SFB 314 (XTRA), Bericht Nr. 37, Juni 1988. In: Sullivan, J.W., Tyler, S.W. (eds.): Architectures for Intelligent Interfaces: Elements and Prototypes. Reading: Addison-Wesley, 1988. B38 Harbusch, K.: Effiziente Analyse naturlicher ¨ Sprache mit TAGs. FB Informatik, KI-Labor, Bericht Nr. 38, Juni 1988. In: Batori, I.S., Hahn, U., Pinkal, M., Wahlster, W. (eds.): Computerlinguistik und ihre theoretischen Grundlagen. Symposium, Saarbrucken, ¨ März 1988. Proceedings. Berlin etc.: Springer, 1988, 79–103. # B39 Schifferer, K.: TAGDevEnv. Eine Werkbank fur ¨ TAGs. FB Informatik, KI-Labor, Bericht Nr. 39, Juni 1988. In: Batori, I.S., Hahn, U., Pinkal, M., Wahlster, W. (eds.): Computerlinguistik und ihre theoretischen Grundlagen. Symposium, Saarbrucken, ¨ März 1988. Proceedings. Berlin etc.: Springer, 1988, 152–171. # B40 Finkler, W., Neumann, G.: MORPHIX. A Fast Realization of a Classification-Based Approach to Morphology. SFB 314 (XTRA), ¨ Bericht Nr. 40, Juni 1988. In: Trost, H. (ed.): 4. Osterreichische Artificial-Intelligence-Tagung. Wiener Workshop - Wissensbasierte Sprachverarbeitung. Proceedings. Berlin etc.: Springer, 1988, 1119. # B41 Harbusch, K.: Tree Adjoining Grammars mit Unifikation. FB Informatik, KI-Labor, Bericht Nr. 41, Juni 1988. In: Trost, H. (ed.): 4. ¨ Osterreichische Artificial-Intelligence-Tagung. Wiener Workshop - Wissensbasierte Sprachverarbeitung. Proceedings. Berlin etc.: Springer, 1988, 188-194. # B42 Wahlster, W., Hecking, M., Kemke, C.: SC: Ein intelligentes Hilfesystem fur ¨ SINIX. FB Informatik, KI-Labor, Bericht Nr. 42, August 1988. In: Gollan, W., Paul, W., Schmitt, A. (eds.), Innovative Informationsinfrastrukturen, Informatik-Fachberichte Nr. 184, Berlin: Springer, 1988. B43 Wahlster, W.: Natural Language Systems: Some Research Trends. FB Informatik, KI-Labor, Bericht Nr. 43, August 1988. In: Schnelle, H., Bernsen, N.O. (eds.): Logic and Linguistics. Research Directions in Cognitive Science: European Perspectives, Vol. 2, Hove: Lawrence Erlbaum, 1989, 171–183.

B46 Scheller, A.: PARTIKO. Kontextsensitive, wissensbasierte Schreibfehleranalyse und -korrektur. FB Informatik, KI-Labor, Bericht Nr. 46, August 1988. In: Batori, I.S., Hahn, U., Pinkal, M., Wahlster, W. (eds.): Computerlinguistik und ihre theoretischen Grundlagen. Symposium, Saarbrucken, ¨ März 1988. Proceedings. Berlin: Springer, 1988, 136–151. B47 Kemke, C.: Darstellung von Aktionen in Vererbungshierarchien. FB Informatik, KI-Labor, Bericht Nr. 47, September 1988. In: Hoeppner, W. (ed.): Kunstliche ¨ Intelligenz. GWAI-88. 12th German Workshop on Artificial Intelligence. Proceedings. Berlin: Springer, 1988, 306–307. B48 Jansen-Winkeln, R.M.: WASTL: An Approach to Knowledge Acquisition in the Natural Language Domain. SFB 314 (XTRA), Bericht Nr. 48, September 1988. In: Boose, J., et al. (eds.): Proceedings of the European Knowledge Acquisition Workshop (EKAW ’88), Bonn 1988, 22-1–22-15. B49 Kemke, C.: What Do You Know About Mail? Representation of Commands in the SINIX Consultant. FB Informatik, KI-Labor, Bericht Nr. 49, Dezember 1988. In: Norwig/Wahlster/Wilensky (eds.): Intelligent Help Systems for UNIX. Berlin: Springer, 1989. # B50 Hecking, M.: Towards a Belief-Oriented Theory of Plan Recognition FB Informatik, KI-Labor, Bericht Nr. 50, Dezember 1988. In: Proceedings of the AAAI-88 Workshop on Plan Recognition. # B51 Hecking, M.: The SINIX Consultant — Towards a Theoretical Treatment of Plan Recognition —. FB Informatik, KI-Labor, Bericht Nr. 51, Januar 1989. In: Norwig/Wahlster/Wilensky (eds.): Intelligent Help Systems for UNIX. Berlin: Springer, 1989. # B52 Schmauks, D.: Die Ambiguität von ’Multimedialität’ oder: Was bedeutet ’multimediale Interaktion’? SFB 314 (XTRA), Bericht Nr. 52, Februar 1989. In: Endres-Niggemeyer/Hermann/Kobsa/Rosner ¨ (eds.): Interaktion und Kommunikation mit dem Computer. Berlin: Springer, 1989, 94–103. # B53 Finkler, W., Neumann, G.: POPEL-HOW. A Distributed Parallel Model for Incremental Natural Language Production with Feedback. SFB 314 (XTRA), Bericht Nr. 53, Mai 1989. In: IJCAI-89. Proceedings. 1518–1523. # B54 Jung, J., Kresse, A., Reithinger, N., Schäfer, R.: Das System ZORA. Wissensbasierte Generierung von Zeigegesten. SFB 314 (XTRA), Bericht Nr. 54, Juni 1989. In: Metzing, D. (ed.): GWAI-89. Proceedings. Berlin: Springer, 1989, 190–194. B55 Schirra, J.R.J.: Ein erster Blick auf ANTLIMA: Visualisierung statischer räumlicher Relationen. SFB 314 (VITRA), Bericht Nr. 55, Juni 1989. In: Metzing, D. (ed.): GWAI-89. Proceedings. Berlin: Springer, 1989, 301–311. # B56 Hays, E.: Two views of motion: On representing move events in a language-vision system. SFB 314 (VITRA), Bericht Nr. 56, Juli 1989. In: Metzing, D. (ed.): GWAI-89. Proceedings. Berlin: Springer, 1989, 312–317. # B57 Allgayer, J., Harbusch, K., Kobsa, A., Reddig, C., Reithinger, N., Schmauks, D.: XTRA: A natural language access system to expert systems. SFB 314 (XTRA), Bericht Nr. 57, Juli 1989. In: International Journal of Man-Machine Studies, 161–195. ´ E., Enkelmann, W., Nagel, H.-H., B58 Herzog, G., Sung, C.-K., Andre, Rist, T., Wahlster, W., Zimmermann, G.: Incremental Natural Language Description of Dynamic Imagery. SFB 314 (VITRA), Bericht Nr. 58, August 1989. In: Brauer, W., Freksa, C. (eds.): Wissensbasierte Systeme. Proceedings. Berlin: Springer, 1989, 153–162.

¨ zu Bildvorstellungen in kogB59 Schirra, J.R.J.: Einige Uberlegungen nitiven Systemen. SFB 314 (VITRA), Bericht Nr. 59, August 1989. In: Freksa/Habel (eds.): Repräsentation und Verarbeitung räumlichen Wissens. Proceedings. Berlin: Springer, 1989, 68–82. #

´ E.: Sprache und Raum: naturlichB60 Herzog, G., Rist, T., Andre, ¨ sprachlicher Zugang zu visuellen Daten. SFB 314 (VITRA), Bericht Nr. 60, August 1989. In: Freksa/Habel (eds.): Repräsentation und Verarbeitung räumlichen Wissens. Proceedings. Berlin: Springer, 1989, 207–220. #

B76 Kobsa, A.: Utilizing Knowledge: The Components of the SB-ONE Knowledge Representation Workbench. SFB 314 (XTRA), Bericht Nr. 76, Dezember 1990. In: Sowa, John (ed.): Principles of Semantic Networks: Explorations in the Representation of Knowledge. San Mateo, CA: Morgan Kaufmann, 1990, 457–486.

B61 Hays, E.M.: On Defining Motion Verbs and Spatial Prepositions. SFB 314 (VITRA), Bericht Nr. 61, Oktober 1989. In: Freksa/Habel (eds.): Repräsentation und Verarbeitung räumlichen Wissens. Proceedings. Berlin: Springer, 1989, 192–206. #

B77 Retz-Schmidt, G.: Recognizing Intentions, Interactions, and Causes of Plan Failures. SFB 314 (VITRA), Bericht Nr. 77, Januar 1991. In: User Modeling and User-Adapted Interaction, 1, 1991, 173–202.

B62 Herzog, G., Retz-Schmidt, G.: Das System SOCCER: Simultane Interpretation und naturlichsprachliche ¨ Beschreibung zeitveränderlicher Szenen. SFB 314 (VITRA), Bericht Nr. 62, Oktober 1989. In: Perl, J. (ed.): Sport und Informatik. Schorndorf: Hofmann, 1989. #

B78 Kobsa, A.: First Experiences with the SB-ONE Knowledge Representation Workbench in Natural-Language Applications. SFB 314 (XTRA), Bericht Nr. 78, Juni 1991. In: AAAI Spring Symposium on Implemented Knowledge Representation and Reasoning Systems, Summer 1991, Stanford, CA, 125–139.

´ E., Herzog, G., Rist,: T. Natural Language Access to VisuB63 Andre, al Data: Dealing with Space and Movement. SFB 314 (VITRA), Bericht Nr. 63, November 1989. In: Nef, F., Borillo, M. (eds.): 1st Workshop on Logical Semantics of Time, Space and Movement in Natural Language. Proceedings. Edition Hermes. #

B79 Schmauks, D.: Referent identification by pointing – classification of complex phenomena. SFB 314 (XTRA), Bericht Nr. 79, Juli 1991. In: Geiger, Richard A. (ed.), Reference in Multidisciplinary Perspective: Philosophical Objact, Cognitive Subjet, Intersubjective Process. Hildesheim, Georg Olms Verlag, 1994.

B64 Kobsa, A.: User Modeling in Dialog Systems: Potentials and Hazards. SFB 314 (XTRA), Bericht Nr. 64, Januar 1990. In: AI and Society. The Journal of Human and Machine Intelligence 4. 214–231. #

B80 Tetzlaff, M., Retz-Schmidt, G.: Methods for the Intentional Description of Image Sequences. SFB 314 (VITRA), Bericht Nr. 80, August 1991. In: Brauer, W., Hernandez, D. (eds.): Verteilte KI und kooperatives Arbeiten. 4. Internationaler GI-Kongreß, 1991, Springer, 433–442.

B65 Reithinger, N.: POPEL — A Parallel and Incremental Natural Language Generation System. SFB 314 (XTRA), Bericht Nr. 65, Februar 1990. In: Paris, C., Swartout, W., Mann, W. (eds.): Natural Language Generation in Artificial Intelligence and Computational Linguistics. Kluwer, 1990. 179–199.#

B81 Schmauks, D.: Verbale und nonverbale Zeichen in der MenschMaschine-Interaktion. Die Klassifikation von Pilzen. SFB 314 (XTRA), Bericht Nr. 81, November 1991. In: Zeitschrift fur ¨ Semiotik 16, 1994, 75–87.

B66 Allgayer, J., Reddig, C.: What KL-ONE Lookalikes Need to Cope with Natural Language — Scope and Aspect of Plural Noun Phrases. SFB 314 (XTRA), Bericht Nr. 66, Februar 1990. In: Bläsius, K., Hedstuck, ¨ U., Rollinger, C.-R. (eds.): Sorts and Types in Artificial Intelligence. Springer, 1990, 240-286.

B82 Reithinger, N.: The Performance of an Incremental Generation Component for Multi-Modal Dialog Contributions. SFB 314 (PRACMA), Bericht Nr. 82, Januar 1992. In: Proceedings of the 6. International Workshop on Natural Language Generation, 1992, Springer.

B67 Allgayer, J., Jansen-Winkeln, R., Reddig, C., Reithinger, N.: Bidirectional use of knowledge in the multi-modal NL access system XTRA SFB 314 (XTRA), Bericht Nr. 67, November 1989. Proceedings of IJCAI-89. 1492–1497. B68 Allgayer, J., Reddig, C.: What’s in a ’DET’? Steps towards Determiner-Dependent Inferencing. SFB 314 (XTRA), Bericht Nr. 68, April 1990. In: Jorrand, P., Sendov, B. (eds.): Artificial Intelligence IV: methodology, systems, applications. Amsterdam: North Holland, 1990, 319-329. B69 Allgayer, J.: SB-ONE+ — dealing with sets efficiently. SFB 314 (XTRA), Bericht Nr. 69, Mai 1990. In: Proceedings of ECAI-90, 1318. B70 Allgayer, J., A. Kobsa, C. Reddig, N. Reithinger: PRACMA: PRocessing Arguments between Controversially-Minded Agents. SFB 314 (PRACMA), Bericht Nr. 70, Juni 1990. In: Proceedings of the Fifth Rocky Mountain Conference on AI: Pragmatics in Artificial Intelligence, Las Cruces, NM, 63–68.

B83 Jansen-Winkeln, R.M., Ndiaye, A., Reithinger, N.: FSS-WASTL: Interactive Knowledge Acquisition for a Semantic Lexicon. SFB 314 (XTRA), Bericht Nr. 83, Februar 1992. In: Ardizonne E., Gaglio, S., Sorbello, F. (eds.): Trends in Artificial Intelligence, Proceedings of the second AI∗IA 1991, Lecture Notes on Artificial Intelligence 529, 1991, Springer, 108–116. B84 Kipper, B.: MODALYS: A System for the Semantic-Pragmatic Analysis of Modal Verbs. SFB 314 (PRACMA), Bericht Nr. 84, Mai 1992. In: Proceedings of the 5th International Conference on Artificial Intelligence - Methodology, Systems, Applications (AIMSA 92), September 1992, Sofia, Bulgaria, 171–180. B85 Schmauks, D.: Untersuchung nicht-kooperativer Interaktionen. SFB 314 (PRACMA), Bericht Nr. 85, Juni 1992. In: Dialogisches Handeln. Festschrift zum 60. Geburtstag von Kuno Lorenz. B86 Allgayer, J., Franconi, E.: A Semantic Account of Plural Entities within a Hybrid Representation System. SFB 314 (PRACMA), Bericht Nr. 86, Juni 1992. In: Proceedings of the 5th International Symposium on Knowledge Engineering, Sevilla, 1992.

B71 Kobsa, A.: Modeling the User’s Conceptual Knowledge in BGPMS, a User Modeling Shell System. SFB 314 (XTRA), Bericht Nr. 71, September 1990 (revised version). In: Computational Intelligence 6(4), 1990, 193–208.

B87 Allgayer, J., Ohlbach, H. J., Reddig, C.: Modelling Agents with Logic. Extended Abstract. SFB 314 (PRACMA), Bericht Nr. 87, Juni 1992. In: Proceedings of the 3rd International Workshop on User Modelling, Dagstuhl, 1992.

B72 Schirra, J.R.J.: Expansion von Ereignis-Propositionen zur Visualisierung. Die Grundlagen der begrifflichen Analyse von ANTLIMA. SFB 314 (VITRA), Bericht Nr. 72, Juli 1990. In: Proceedings of GWAI-90, 246–256. #

B88 Kipper, B.: Eine Disambiguierungskomponente fur ¨ Modalverben. SFB 314 (PRACMA), Bericht Nr. 88, Juli 1992. In: Tagungsband der ersten Konferenz zur Verarbeitung naturlicher ¨ Sprache (KONVENS 92), Nurnberg, ¨ 1992, 258–267.

B73 Schäfer, R.: SPREADIAC. Intelligente Pfadsuche und -bewertung auf Vererbungsnetzen zur Verarbeitung impliziter Referenzen. SFB 314 (XTRA), Bericht Nr. 73, August 1990. In: Proceedings of GWAI-90, 231-235.

B89 Schmauks, D.: Was heißt ”taktil” im Bereich der MenschMaschine-Interaktion ? SFB 314 (PRACMA), Bericht Nr. 89, August 1992. In: Proceedings des 3. Internationalen Symposiums fur ¨ Informationswissenschaft ISI’92, September 1992, Saarbrucken, ¨ 13–25.

B74 Schmauks, D., Wille, M.: Integration of communicative hand movements into human-computer-interaction. SFB 314 (XTRA), Bericht Nr. 74, November 1990. In: Computers and the Humanities 25, 1991, 129-140. B75 Schirra, J.R.J.: A Contribution to Reference Semantics of Spatial Prepositions: The Visualization Problem and its Solution in VITRA. SFB 314 (VITRA), Bericht Nr. 75, Dezember 1990. In: Zelinsky-Wibbelt, Cornelia (ed.): The Semantics of Prepositions – From Mental Processing to Natural Language Processing. Berlin: Mouton de Gruyter, 1993, 471–515. #

B90 Schirra, J.R.J.: Connecting Visual and Verbal Space: Preliminary Considerations Concerning the Concept ’Mental Image’. SFB 314 (VITRA), Bericht Nr. 90, Novenber 1992. In: Proceedings of the 4th European Workshop “Semantics of Time, Space and Movement and Spatio-Temporal Reasoning”, September 4-8, 1992, Château de Bonas, France, 105–121. B91 Herzog, G.: Utilizing Interval-Based Event Representations for Incremental High-Level Scene Analysis. SFB 314 (VITRA), Bericht Nr. 91, November 1992. In: Proceedings of the 4th European

Workshop “Semantics of Time, Space and Movement and SpatioTemporal Reasoning”, September 4-8, 1992, Château de Bonas, France, 425–435. B92 Herzog, G., Maaß, W., Wazinski, P.: VITRA GUIDE: Utilisation du Langage Naturel et de Représentations Graphiques pour la Description d’Itinéraires. SFB 314 (VITRA), Bericht Nr. 92, Januar 1993. In: Colloque Interdisciplinaire du Comitée National ”Images et Langages: Multimodalitée et Modélisation Cognitive”, Paris, 1993, 243–251. B93 Maaß, W., Wazinski, P., Herzog, G.: VITRA GUIDE: Multimodal Route Descriptions for Computer Assisted Vehicle Navigation. SFB 314 (VITRA), Bericht Nr. 93, Februar 1993. In: Proceedings of the Sixth International Conference on Industrial & Engineering Applications on Artificial Intelligence & Expert Systems. Edinburgh, U.K., June 1-4, 1993, 144–147. B94 Schirra, J.R.J., Stopp, E.: ANTLIMA – A Listener Model with Mental Images. SFB 314 (VITRA), Bericht Nr. 94, März 1993. In: Proceedings of IJCAI-93. Chambéry, France, August 29 - September 3, 1993, 175–180. B95 Maaß, W.: A Cognitive Model for the Process of Multimodal, Incremental Route Descriptions. SFB 314 (VITRA), Bericht Nr. 95, Mai 1993. In: Proceedings of the European Conference on Spatial Information Theory. Lecture Notes in Compute Science, Springer. Marciana Marina, Elba, Italy, September 19-22, 1993, 1–24.

¨ Dialogsysteme immer objektiv sein? Fragen B96 Jameson, A.: Mussen wir sie selbst!. SFB 314 (PRACMA), Bericht Nr. 96, Mai 1993. In: Kunstliche ¨ Intelligenz 7(2), 1993, 75–81. B97 Schneider, A.: Connectionist Simulation of Adaptive Processes in the Flight Control System of Migratory Locusts. Fachbereich Informatik, KI-Labor, Bericht Nr. 97, September 1993. In: Proceedings of Artificial Neural Networks in Engineering 1993, St. Louis, Missouri, USA: Intelligent Engineerung Systems Through Artificial Neural Networks, Vol. 3, November 1993, ASME Press, New York, USA, 599–604. B98 Kipper, B.: A Blackboard Architecture for Natural Language Analysis. SFB 314 (PRACMA), Bericht Nr. 98, Februar 1994. In: Proceedings of the Seventh Florida Artificial Intelligence Research Symposium (FLAIRS 94), May 5-7, 1994, Pensacola Beach, USA, 231– 235. B99 Gapp, K.-P.: Einsatz von Visualisierungstechniken bei der Analyse von Realweltbildfolgen. SFB 314 (VITRA), Bericht Nr. 99, April 1994. In: Tagungsband des 1. Workshops zum Thema visual computing, Darmstadt, 1994. B100 Herzog, G., Wazinski, P.: VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. SFB 314 (VITRA), Bericht Nr. 100, April 1994. In: Artificial Intelligence Review Journal, 8(2), Special Volume on the Integration of Natural Language and Vision Processing, edited by P. Mc Kevitt, 1994, 175-187. B101 Gapp, K.-P.: Basic Meanings of Spatial Relations: Computation and Evaluation in 3D Space. SFB 314 (VITRA), Bericht Nr. 101, April 1994. In: Proceedings of the 12th National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, 1994, 1393–1398. B102 Gapp, K.-P., Maaß, W.: Spatial Layout Identification and Incremental Descriptions. SFB 314 (VITRA), Bericht Nr. 102, Mai 1994. In: Proceedings of the Workshop on the Integration of Natural Language and Vision Processing, 12th National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, August 2-3, 1994, 145–152. ´ E., Herzog, G., Rist, T.: Multimedia Presentation of InterB103 Andre, preted Visual Data. SFB 314 (VITRA), Bericht Nr. 103, Juni 1994. In: Proceedings of the Workshop on the Integration of Natural Language and Vision Processing, 12th National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, August 2-3, 1994, 74–82. B104 Lueth, T.C., Laengle, Th., Herzog, G., Stopp, E., Rembold, U.: KANTRA: Human-Machine Interaction for Intelligent Robots Using Natural Language. SFB 314 (VITRA), Bericht Nr. 104, Juni 1994. In: Proceedings of the 3rd International Workshop on Robot and Human Communication (RO-MAN ’94), Nagoya University, Nagoya, Japan, July 18-20, 1994, 106–111.

B105 Kipper, B., Jameson, A.: Semantics and Pragmatics of Vague Probability Expressions. SFB 314 (PRACMA), Bericht Nr. 105, Juni 1994. In: Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, Atlanta, Georgia, USA, August 1994, 496–501. B106 Jameson, A., Kipper, B., Ndiaye, A., Schäfer, R., Simons, J., Weis, T., Zimmermann, D.: Cooperating to Be Noncooperative: The Dialog System PRACMA. SFB 314 (PRACMA), Bericht Nr. 106, Juni 1994. In: Nebel, B., Dreschler-Fischer, L. (eds.): KI-94: Advances in Artificial Intelligence, Proceedings of the Eighteenth German Conference on Artificial Intelligence, Saarbrucken, ¨ Germany, September 1994, Berlin, Heidelberg: Springer, 106–117. B107 Stopp, E., Gapp, K.-P., Herzog, G., Laengle, T., Lueth, T.C.: Utilizing Spatial Relations for Natural Language Access to an Autonomous Mobile Robot. SFB 314 (VITRA), Bericht Nr. 107, Juli 1994. In: Nebel, B., Dreschler-Fischer, L. (eds.): KI-94: Advances in Artificial Intelligence, Proceedings of the Eighteenth German Conference on Artificial Intelligence, Saarbrucken, ¨ Germany, September 1994, Berlin, Heidelberg: Springer, 39–50. B108 Maaß, W.: From Vision to Multimodal Communication: Incremental Route Descriptions. SFB 314 (VITRA), Bericht Nr. 108, Juli 1994. In: Artificial Intelligence Review Journal, 8(2/3), Special Volume on the Integration of Natural Language and Vision Processing, 1994, 159–174. B109 Ndiaye, A., Jameson, A.: Supporting Flexibility and Transmutability: Multi-Agent Processing and Role-Switching in a Pragmatically Oriented Dialog System. SFB 314 (PRACMA), Bericht Nr. 109, August 1994. In: Jorrand, P., Sgurev, V. (eds.): Proceedings of the Sixth Annual Conference on Artificial Intelligence: Methodology, Systems, Applications (AIMSA ’94), Sofia, Bulgaria, September 21-24, 1994, 381–390. B110 Gapp, K.-P.: From Vision to Language: A Cognitive Approach to the Computation of Spatial Relations in 3D Space. SFB 314 (VITRA), Bericht Nr. 110, Oktober 1994. In: Proceedings of the First Conference on Cognitive Science in Industry, Luxembourg, 1994, 339–357. B111 Gapp, K.-P.: A Computational Model of the Basic Meanings of Graded Composite Spatial Relations in 3-D Space. SFB 314 (VITRA), Bericht Nr. 111, Oktober 1994. In: Proceedings of the Advanced Geographic Data Modelling Workshop, Delft, The Netherlands, 1994. B112 Schäfer, R.: Multidimensional Probabilistic Assessment of Interest and Knowledge in a Noncooperative Dialog Situation. SFB 314 (PRACMA), Bericht Nr. 112, Dezember 1994. In: Proceedings of ABIS-94: GI Workshop on Adaptivity and User Modeling in Interative Software Systems, St. Augustin, October 1994, 46–62. ´ E., Herzog, G., Rist, T.: Von der Bildfolge zur multimediaB113 Andre, len Präsentation. SFB 314 (VITRA), Bericht Nr. 113, Februar 1995. In: Arbeitsgemeinschaft Simulation in der Gesellschaft fur ¨ Informatik (ASIM), Mitteilungen aus den Arbeitskreisen, Heft Nr. 46, Fachtagung “Integration von Bild, Modell und Text”, Magdeburg, 2.-3. März 1995, 129–142. B114 Laengle, T., Lueth, T.C., Stopp, E., Herzog, G., Kamstrup, G.: KANTRA - A Natural Language Interface for Intelligent Robots. SFB 314 (VITRA), Bericht Nr. 114, März 1995. In: Proc. of the 4th International Conference on Intelligent Autonomous Systems, Karlsruhe, Germany, 1995, 357–364. B115 Gapp, K.-P.: Angle, Distance, Shape, and their Relationship to Projective Relations. SFB 314 (VITRA), Bericht Nr. 115, Mai 1995. In: Proc. of the 17th Conference of the Cognitive Science Society, Pittsburgh, PA, 1995. B116 Maaß, W.: How Spatial Information Connects Visual Perception and Natural Language Generation in Dynamic Environments: Towards a Computational Model. SFB 314 (VITRA), Bericht Nr. 116, Juni 1995. In: Proceedings of the 2nd International Conference on Spatial Information Theory (COSIT’95), Vienna, September 21-23, 1995, 223–240. B117 Maaß, W., Baus, J., Paul, J.: Visual Grounding of Route Descriptions in Dynamic Environments. SFB 314 (VITRA), Bericht Nr. 117, Juli 1995. To appear in: Proceedings of the AAAI Fall Symposium on “Computational Models for Integrating Language and Vision”, MIT, Cambridge, MA, USA, 1995.

B118 Gapp, K.-P.: An Empirically Validated Model for Computing Spatial Relations. SFB 314 (VITRA), Bericht Nr. 118, Juli 1995. In: Wachsmuth, I., Rollinger, C.-R., Brauer, W. (eds.): Advances in Artificial Intelligence, Proceedings of the 19th Annual German Conference on Artificial Intelligence (KI-95), Bielefeld, Germany, September 1995, Berlin, Heidelberg: Springer, 245–256. B119 Gapp, K.-P.: Object Localization: Selection of Optimal Reference Objects. SFB 314 (VITRA), Bericht Nr. 119, Juli 1995. In: Proceedings of the 2nd International Conference on Spatial Information Theory (COSIT’95), Vienna, September 21-23, 1995, 519–536. B120 Herzog, G.: Coping with Static and Dynamic Spatial Relations. SFB 314 (VITRA), Bericht Nr. 120, Juli 1995. In: Amsili, P., Borillo, M., Vieu, L. (eds.): Proceedings of TSM’95, Time, Space, and Movement: Meaning and Knowledge in the Sensible World, Groupe “Langue, Raisonnement, Calcul”, Toulouse, Château de Bonas, France, 1995, C 47–59. B121 Blocher, A., Schirra, J.R.J.: Optional Deep Case Filling and Focus Control with Mental Images: ANTLIMA-KOREF. SFB 314 (VITRA), Bericht Nr. 121, Juli 1995. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montréal, Canada, August 19-25, 1995, 417–423. B122 Herzog, G., Rohr, K.: Integrating Vision and Language: Towards Automatic Description of Human Movements. SFB 314 (VITRA), Bericht Nr. 122, Juli 1995. In: Wachsmuth, I., Rollinger, C.-R., Brauer, W. (eds.): Advances in Artificial Intelligence, Proceedings of the 19th Annual German Conference on Artificial Intelligence (KI-95), Bielefeld, Germany, September 1995, Berlin, Heidelberg: Springer, 257–268. B123 Blocher, A., Stopp, E.: Time-Dependent Generation of Minimal Sets of Spatial Descriptions. SFB 314 (VITRA), Bericht Nr. 123, Juli 1995. To appear in: Proceedings of the Workshop on the Representation and Processing of Spatial Expressions at the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montréal, Canada, August 19, 1995. B124 Herzog, G.: From Visual Input to Verbal Output in the Visual Translator. SFB 314 (VITRA), Bericht Nr. 124, Juli 1995. To appear in: Proceedings of the AAAI Fall Symposium on “Computational Models for Integrating Language and Vision”, MIT, Cambridge, MA, USA, 1995. B125 Jameson, A., Schäfer, R., Simons, J., Weis, T.: Adaptive Provision of Evaluation-Oriented Information: Tasks and Techniques. SFB 314 (PRACMA), Bericht Nr. 125, Juli 1995. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montréal, Canada, August 19-25, 1995, 1886–1893. B126 Jameson, A.: Logic Is Not Enough: Why Reasoning About Another Person’s Beliefs Is Reasoning Under Uncertainty. SFB 314 (PRACMA), Bericht Nr. 126, Juli 1995. In: Laux, A., Wansing, H. (eds.): Knowledge and Belief in Philosophy and Artificial Intelligence, Berlin: Akademie-Verlag, 1995, 199–229. B127 Zimmermann, D.: Exploiting Models of Musical Structure for Automatic Intention-Based Composition of Background Music. KILabor, Bericht Nr. 127, Juli 1995. In: Proceedings of the workshop on Artificial Intelligence and Music at the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montréal, Canada, official IJCAI-95 workshop proceedings, Menlo Park (CA): AAAI Press, 1995, 55–62. B128 Ndiaye, A., Jameson, A.: Predictive Role Taking in Dialog: Global Anticipation Feedback Based on Transmutability. SFB 314 (PRACMA), Bericht Nr. 128, November 1995. In: Proceedings of the Fifth International Conference on User Modeling, Kailua-Kona, Hawaii, USA, January 1996, 137–144. B129 Stopp, E., Laengle, T.: Naturlichsprachliche ¨ Instruktionen an einen autonomen Serviceroboter. SFB 314 (VITRA), Bericht Nr. 129, November 1995. In: Dillmann, R., Rembold, U., Luth, ¨ T.: Autonome Mobile Systeme 1995, Tagungsband der “Autonome Mobile Systeme” (AMS ’95), 30. November - 1. Dezember 1995, Berlin, Heidelberg: Springer, 1995, 299–308. B130 Wahlster, W., Jameson, A., Ndiaye, A., Schäfer, R., Weis, T.: Ressourcenadaptive Dialogfuhrung: ¨ ein interdisziplinärer Forschungsansatz. SFB 378 (READY), Bericht Nr. 130, November 1995. In: Kunstliche ¨ Intelligenz, 9(6), 1995, 17–21.

B131 Jameson, A.: Numerical Uncertainty Management in User and Student Modeling: An Overview of Systems and Issues. SFB 314 (PRACMA), Bericht Nr. 131, November 1995. To appear in: User Modeling and User-Adapted Interaction, 5, 1995. B132 Längle, T., Luth, ¨ T.C., Stopp, E., Herzog, G.: Natural Language Access to Intelligent Robots: Explaining Automatic Error Recovery. SFB 378 (REAL), Bericht Nr. 132, Oktober 1996. In: Ramsay, A.M. (ed.): Artificial Intelligence: Methodology, Systems, Applications, Proc. of AIMSA’96, Sozopol, Bulgaria, September 1996, Amsterdam: IOS Press, 259–267. B133 Jameson, A., Weis, T.: How to Juggle Discourse Obligations. SFB 378 (READY), Bericht Nr. 133, Oktober 1996. In: Proceedings of the Symposium on Conceptual and Semantic Knowledge in Language Generation, Heidelberg, Germany, 15-17 November, 1995, 171–185. B134 Weis, T.: Die Rolle von Diskursverpflichtungen in bewertungsorientierten Informationsdialogen. SFB 378 (READY), Bericht Nr. 134, Oktober 1996. In: Dafydd Gibbon (ed.): Natural Language Processing and Speech Technology. Results of the 3rd KONVENS Conference, Bielefeld, Germany, October 1996, Berlin: Mouton de Gruyter. B135 Gapp, K.-P.: Processing Spatial Relations in Object Localization Tasks. SFB 378 (REAL), Bericht Nr. 135, Oktober 1996. In: Proceedings of AAAI Fall Symposium on Computational Models for ” Integrating Language and Vision“, MIT, Cambridge, MA, USA, 1995. B136 Ndiaye, A.: Rollenubernahme ¨ in einem Dialogsystem. SFB 378 (READY), Bericht Nr. 136, Januar 1997. In: Kunstliche ¨ Intelligenz, 10(4), 1996, 34–40. B137 Stopp, E., Blocher, A.: Construction of Mental Images and their Use in a Listener Model. SFB 378 (REAL), Bericht Nr. 137, Januar 1997. In: Proceedings of the Symposium on Conceptual and Semantic Knowledge in Language Generation, Heidelberg, Germany, 15-17 November, 1995, 270–280. B138 Schäfer, R., Weyrath, T.: Assessing Temporally Variable User Properties With Dynamic Bayesian Networks. SFB 378 (READY), Bericht Nr. 138, August 1997. In: Jameson, A., Paris, C., Tasso, C. (Eds.): User Modeling: Proceedings of the Sixth International Conference, UM97. Vienna, New York: Springer, 1997, 377–388. B139 Weis, T.: Resource-Adaptive Action Planning in a Dialogue System for Repair Support. SFB 378 (READY), Bericht Nr. 139, August 1997. To appear in: Nebel, B. (ed.): Proceedings der 21. Deutschen Jahrestagung fur ¨ Kunstliche ¨ Intelligenz, Freiburg im Breisgau, Deutschland. Berlin, New York: Springer, 1997. B140 Herzog, G., Blocher, A., Gapp, K.-P., Stopp, E., Wahlster, W.: VITRA: Verbalisierung visueller Information. SFB 378 (REAL), Bericht Nr. 140, Januar 1998. In: Informatik - Forschung und Entwicklung, 11(1), 1996, 12–19. B141 Wahlster, W., Blocher, A., Baus, J., Stopp, E., Speiser, H.: Ressourcenadaptierende Objektlokalisation: Sprachliche Raumbeschreibung unter Zeitdruck. SFB 378 (REAL, VEVIAG), Bericht Nr. 141, April 1998. Erscheint in: Kognitionswissenschaft, Sonderheft zum Sonderforschungsbereich 378, Berlin, Heidelberg: Springer, 1998. B142 Zimmer, H.D., Speiser, H.R., Baus, J., Blocher, A., Stopp, E.: The Use of Locative Expressions in Dependence of the Spatial Relation between Target and Reference Object in Two-Dimensional Layouts. SFB 378 (REAL, VEVIAG), Bericht Nr. 142, April 1998. In: Freksa, C., Habel, C., Wender, K.F. (eds.): Spatial cognition - An interdisciplinary approach to representation and processing of spatial knowledge, Berlin, Heidelberg: Springer, 1998, 223–240. B143 Wahlster, W., Tack, W.: SFB 378: Ressourcenadaptive Kognitive Prozesse. SFB 378 (REAL), Bericht Nr. 143, April 1998. In: Jarke, M., Pasedach, K., Pohl, K. (Hrsg.): Informatik’97 - Informatik als Innovationsmotor, 27. Jahrestagung der Gesellschaft fur ¨ Informatik, Aachen, 24.-26. September 1997, Berlin, Heidelberg: Springer, 1997, 51–57.

Memos M1 Baltes, H., Houy, C., Scheller, A., Schifferer, K.: PORTFIX - Portierung des FRANZ-LISP-Systems von VAX/UNIX nach PCMX/SINIX. FB Informatik, III-Projekt, Memo Nr. 1, Juli 1985. #

M2 Grasmuck, ¨ R., Guldner, A.: Wissensbasierte Fertigungsplanung in Stanzereien mit FERPLAN: Ein Systemuberblick. ¨ FB Informatik, KI-Labor, Memo Nr. 2, August 1985. #

M24 Schmauks, D., Reithinger, N.: Generierung multimodaler Ausgabe in NL Dialogsystemen — Voraussetzungen, Vorteile und Probleme. SFB 314 (XTRA), Memo Nr. 24, April 1988. #

M3 Baltes, H.: GABI — Ein wissensbasiertes Geldanlageberatungsprogramm. FB Informatik, KI-Labor, Memo Nr. 3, November 1985. #

M25 Herzog, G., Rist, T.: Simultane Interpretation und naturlich¨ sprachliche Beschreibung zeitveränderlicher Szenen: Das System SOCCER. SFB 314 (VITRA), Memo Nr. 25, August 1988. # ¨ ´ E.: Generierung naturlichsprachlicher ¨ Außerungen zur siM26 Andre, multanen Beschreibung von zeitveränderlichen Szenen: Das System SOCCER. SFB 314 (VITRA), Memo Nr. 26, August 1988. #

M4 Schmauks, D.: Formulardeixis und ihre Simulation auf dem Bild¨ schirm. Ein Uberblick aus linguistischer Sicht. SFB 314, (XTRA), Memo Nr. 4, Februar 1986. In: Conceptus 55, 1987, 83–102. # ´ E., Bosch, G., Herzog, G., Rist, T.: Characterizing TrajectoM5 Andre, ries of Moving Objects Using Natural Language Path Descriptions. SFB 314, (VITRA), Memo Nr. 5, März 1986. In: ECAI 86. The 7th European Conference on Artificial Intelligence. Proceedings, Vol. 2. Brighton 1986, 1–8. #

M27 Kemke, C.: Die Modellierung neuronaler Verbände basierend auf Netzwerken endlicher Automaten. FB Informatik, KI-Labor, Memo Nr. 27, August 1988. In: Tagungsband des Workshops “Konnektionismus”, St. Augustin 1988.

M6 Baltes, H.: GABI: Frame-basierte Wissensrepräsentation in einem Geldanlageberatungssystem. FB Informatik, KI-Labor, Memo Nr. 6, März 1986. #

M28 Hecking, M., Kemke, C., Nessen, E., Dengler, D., Gutmann, M., Hector, G.: The SINIX Consultant — A Progress Report. FB Informatik, KI-Labor (SC-Projekt), Memo Nr. 28, August 1988.

M7 Brach, U., Woll, W.: WILIE — Ein wissensbasiertes System zur Vereinfachung der interaktiven Literaturerfassung. FB Informatik, KI-Labor, Memo Nr. 7, April 1986. #

M29 Bauer, M., Diewald, G., Merziger, G., Wellner, I.: REPLAI. Ein System zur inkrementellen Intentionserkennung in RealweltSzenen. SFB 314 (VITRA), Memo Nr. 29, Oktober 1988. #

M8 Finkler, W., Neumann, G.: MORPHIX — Ein hochportabler Lemmatisierungsmodul fur ¨ das Deutsche. FB Informatik, KI-Labor, Memo Nr. 8, Juli 1986.

M30 Kalmes, J.: SB-Graph User Manual (Release 0.1). SFB 314 (XTRA), Memo Nr. 30, Dezember 1988.

M9 Portscheller, R.: AIDA — Rekursionsbehandlung, Konfliktauflosung ¨ und Regelcompilierung in einem Deduktiven Datenbanksystem. FB Informatik, KI-Labor, Memo Nr.9, Oktober 1986.# M10 Schirra, J.: MEGA-ACT — Eine Studie uber ¨ explizite Kontrollstrukturen und Pseudoparallelität in Aktor-Systemen mit einer Beispielarchitektur in FRL. FB Informatik, KI-Labor, Memo Nr. 10, August 1986. # M11 Allgayer, J., Reddig, C.: Systemkonzeption zur Verarbeitung kombinierter sprachlicher und gestischer Referentenbeschreibungen. SFB 314 (XTRA), Memo Nr. 11, Oktober 1986. # M12 Herzog, G.: Ein Werkzeug zur Visualisierung und Generierung von geometrischen Bildfolgenbeschreibungen. SFB 314 (VITRA), Memo Nr. 12, Dezember 1986. # M13 Retz-Schmidt, G.: Deictic and Intrinsic Use of Spatial Prepositions: A Multidisciplinary Comparision. SFB 314 (VITRA), Memo Nr. 13, Dezember 1986. In: Kak, A., Chen, S.-S. (eds.): Spatial Reasoning and Multisensor Fusion, Proceedings of the 1987 Workshop. Los Altos, CA: Morgan Kaufmann, 1987, 371–380. # M14 Harbusch, K.: A First Snapshot of XTRAGRAM, A Unification Grammar for German Based on PATR. SFB 314 (XTRA), Memo Nr. 14, Dezember 1986. # M15 Fendler, M., Wichlacz, R.: SYCON — Symbolische Constraint-Propagierung auf Netzwerken, Entwurf und Implementierung. FB Informatik, KI-Labor, Memo Nr. 15, März 1987. M16 Dengler, D., Gutmann, M., Hector, G., Hecking, M.: Der Planerkenner REPLIX. FB Informatik, KI-Labor (SC-Projekt), Memo Nr. 16, September 1987. # M17 Hecking, M., Harbusch, K.: Plan Recognition through Attribute Grammars. FB Informatik, KI-Labor (SC-Projekt), Memo Nr. 17, September 1987. M18 Nessen, E.: SC-UM — User Modeling in the SINIX-Consultant. FB Informatik, KI-Labor (SC-Projekt), Memo Nr. 18, November 1987. In: Applied Artificial Intelligence, 3,1, 1989, 33–44. # M19 Herzog, G.: LISPM Miniatures, Part I. SFB 314 (VITRA), Memo Nr. 19, November 1987. M20 Blum, E.J.: ROSY — Menu-basiertes ¨ Parsing naturlicher ¨ Sprache unter besonderer Berucksichtigung ¨ des Deutschen. FB Informatik, KI-Labor, Memo Nr. 20, Dezember 1987. M21 Beiche, H.-P.: Zusammenwirken von LST-1, PLUG, FORMULAR und MINI-XTRA. SFB 314 (XTRA), Memo Nr. 21, Januar 1988. # M22 Allgayer, J., Harbusch, K., Kobsa, A., Reddig, C., Reithinger, N., Schmauks, D., Wahlster, W.: Arbeitsbericht des Projektes NS1: XTRA fur ¨ den Forderungszeitraum ¨ vom 1.4.85 bis 31.12.87. SFB 314 (XTRA), Memo Nr. 22, Januar 1988. # M23 Kobsa, A.: A Bibliography of the Field of User Modeling in Artificial Intelligence Dialog Systems. SFB 314 (XTRA), Memo Nr. 23, April 1988.

M31 Kobsa, A.: The SB-ONE Knowledge Representation Workbench. SFB 314 (XTRA), Memo Nr. 31, März 1989. # M32 Aue, D., Heib, S., Ndiaye, A.: SB-ONE Matcher: Systembeschreibung und Benutzeranleitung. SFB 314 (XTRA), Memo Nr. 32, März 1989. M33 Profitlich, H.-J.: Das SB-ONE Handbuch. Version 1.0 . SFB 314 (XTRA), Memo Nr. 33, April 1989. M34 Bauer, M., Merziger, G.: Conditioned Circumscription: Translating Defaults to Circumscription. FB Informatik, KI-Labor (SCProjekt), Memo Nr. 34, Mai 1989. M35 Scheller, A.: PARTIKO. Kontextsensitive, wissensbasierte Schreibfehleranalyse und -korrektur. FB Informatik, KI-Labor, Memo Nr. 35, Mai 1989. M36 Wille, M.: TACTILUS II. Evaluation und Ausbau einer Komponente zur Simulation und Analyse von Zeigegesten. SFB 314 (XTRA), Memo Nr. 36, August 1989. #

¨ vom Computer. SFB 314 M37 Muller, ¨ S.: CITYGUIDE — Wegauskunfte (VITRA), Memo Nr. 37, September 1989. M38 Reithinger, N.: Dialogstrukturen und Dialogverarbeitung in XTRA. SFB 314 (XTRA), Memo Nr. 38, November 1989. M39 Allgayer, J., Jansen-Winkeln, R., Kobsa, A., Reithinger, N., Red¨ Zugangsdig, C., Schmauks, D.: XTRA. Ein naturlichsprachliches system zu Expertensystemen. SFB 314 (XTRA), Memo Nr. 39, Dezember 1989. # M40 Beiche, H.-P.: LST-1. Ein Expertensystem zur Unterstutzung ¨ des Benutzers bei der Durchfuhrung ¨ des Lohnsteuerjahresausgleichs. SFB 314 (XTRA), Memo Nr. 40, Dezember 1989. M41 Dengler, D.: Referenzauflosung ¨ in Dialogen mit einem intelligenten Hilfesystem. FB Informatik (KI-Labor), Memo Nr. 41, Januar 1990. # M42 Kobsa, A.: Conceptual Hierachies in Classical and Connectionist Architecture. SFB 314 (XTRA), Memo Nr. 42, Januar 1990. M43 Profitlich, H.-J.: SB-ONE: Ein Wissenrepräsentationssystem basierend auf KL-ONE. SFB 314 (XTRA), Memo Nr. 43, März 1990. M44 Kalmes, J.: SB-Graph. Eine graphische Benutzerschnittstelle fur ¨ die Wissensrepräsentationswerkbank SB-ONE. SFB 313 (XTRA), Memo Nr. 44, März 1990. M45 Jung, J., Kresse, A., Schäfer, R.: ZORA. Ein Zeigegestengeneratorprogram. SFB 314 (XTRA), Memo Nr. 45, April 1990. M46 Harbusch, K., Huwig, W.: XTRAGRAM. A Unification Grammar for German Based on PATR. SFB 314 (XTRA), Memo Nr. 46, April 1990. # M47 Scherer, J.: SB-PART Handbuch. Version 1.0 SFB 314 (XTRA), Memo Nr. 47, Juli 1990. M48 Scherer, J.: SB-PART: ein Partitionsverwaltungssystem fur ¨ die Wissensrepräsentationssprache SB-ONE. SFB 314 (XTRA), Memo Nr. 48, September 1990. #

M49 Schirra, J.R.J.: Zum Nutzen antizipierter Bildvorstellungen bei der sprachlichen Szenenbeschreibung. — Ein Beispiel —. SFB 314 (VITRA), Memo Nr. 49, Dezember 1991. #

M75 Berthold, A.: Repräsentation und Verarbeitung sprachlicher Indikatoren fur ¨ kognitive Ressourcenbeschränkungen. SFB 378 (READY), Memo Nr. 75, Mai 2001.

M50 Blocher, A., Stopp, E., Weis, T.: ANTLIMA-1: Ein System zur Generierung von Bildvorstellungen ausgehend von Propositionen. SFB 314 (VITRA), Memo Nr. 50, März 1992.

M76 Gebhard, P.: R OPLEX : Naturlichsprachliche ¨ Beschreibung von generischen Roboterplandaten. SFB 378 (REAL), Memo Nr. 76, Mai 2001.

M51 Allgayer, J., Franconi, E.: Collective Entities and Relations in Concept Languages. SFB 314 (PRACMA), Memo Nr. 51, Juni 1992.

M77 Werner, A.: E BABA : Probabilistische Einschätzung von Bewer¨ tungskriterien aufgrund bewertender Ausserungen. SFB 378 (READY), Memo Nr. 77, Mai 2001.

M52 Allgayer, J., Schmitt, R.: SB·LITTERS: Die Zugangssprache zu SBONE+ . SFB 314 (PRACMA), Memo Nr. 52, August 1992. M53 Herzog, G.: Visualization Methods for the VITRA Workbench. SFB 314 (VITRA), Memo Nr. 53, Dezember 1992. M54 Wazinski, P.: Graduated Topological Relations. SFB 314 (VITRA), Memo Nr. 54, Mai 1993. M55 Jameson, A., Schäfer, R.: Probabilistische Einschätzung von Wissen und Interessen. Die Anwendung der intuitiven Psychometrik im Dialogsystem PRACMA. SFB 314 (PRACMA), Memo Nr. 55, Juni 1993. M56 Schneider, A.: Konnektionistische Simulation adaptiver Leistungen des Flugsteuersystems der Wanderheuschrecke. KI-Labor, Memo Nr. 56, Juni 1993. M57 Kipper, B., Ndiaye, A., Reithinger, N., Reddig, C., Schäfer, R.: Arbeitsbericht fur ¨ den Zeitraum 1991–1993: PRACMA SFB 314 (PRACMA), Memo Nr. 57, Juli 1993. M58 Herzog, G., Schirra, J., Wazinski, P.: Arbeitsbericht fur ¨ den Zeitraum 1991–1993: VITRA SFB 314 (VITRA), Memo Nr. 58, Juli 1993. M59 Gapp, K.-P.: Berechnungsverfahren fur ¨ räumliche Relationen in 3D-Szenen. SFB 314 (VITRA), Memo Nr. 59, August 1993. M60 Stopp, E.: GEO-ANTLIMA: Konstruktion dreidimensionaler mentaler Bilder aus sprachlichen Szenenbeschreibungen. SFB 314 (VITRA), Memo Nr. 60, Oktober 1993. M61 Blocher, A.: KOREF: Zum Vergleich intendierter und imaginierter ¨ Außerungsgehalte. SFB 314 (VITRA), Memo Nr. 61, Mai 1994. M62 Paul, M.: IBEE: Ein intervallbasierter Ansatz zur Ereigniserkennung fur ¨ die inkrementelle Szenenfolgenanalyse. SFB 314 (VITRA), Memo Nr. 62, November 1994. M63 Stopp, E., Blocher, A.: Spatial Information in Instructions and Questions to an Autonomous System. SFB 378 (REAL), Memo Nr. 63, Mai 1997.

¨ naturlichsprachlichen ¨ Zugang zu autoM64 Stopp, E.: Ein Modell fur nomen Robotern. SFB 378 (REAL), Memo Nr. 64, Mai 1997. M65 Schäfer, R., Bauer, M. (Hrsg.): ABIS-97, 5. GI-Workshop Adaptivität und Benutzermodellierung in interaktiven SoftwaresysteSFB 378 (READY), Memo men, 30.9. bis 2.10.1997, Saarbrucken. ¨ Nr. 65, September 1997. M66 Rupp, U.: GRATOR - Räumliches Schließen mit GRAdierten TOpologischen Relationen uber ¨ Punktmengen. SFB 378 (REAL), Memo Nr. 66, April 1998. M67 Kray, C.: Ressourcenadaptierende Verfahren zur Präzisionsbewertung von Lokalisationsausdrucken ¨ und zur Generierung von linguistischen Hecken. SFB 378 (REAL), Memo Nr. 67, April 1998. M68 Muller, ¨ C.: Das REAL Speech Interface. SFB 378 (REAL), Memo Nr. 68, Januar 1999. M69 Wittig, F.: Ein Java-basiertes System zum Ressourcenmanagement in Anytime-Systemen. SFB 378 (REAL), Memo Nr. 69, Januar 1999. M70 Lindmark, K.: Identifying Symptoms of Time Pressure and Cognitive Load in Manual Input. SFB 378 (READY), Memo Nr. 70, September 2000. M71 Beckert, A.: Kompilation von Anytime-Algorithmen: Konzeption, Implementation und Analyse. SFB 378 (REAL), Memo Nr. 71, Mai 2001 M72 Beckert, A.: O RCAN : Laufzeitmessungen von Methoden zur Anytime-Kompilierung. SFB 378 (REAL), Memo Nr. 72, Mai 2001. M73 Baus, A., Beckert, A.: O RCAN : Implementation von Methoden zur Compilierung von Anytime-Algorithmen. SFB 378 (REAL), Memo Nr. 73, Mai 2001. M74 Weyrath, T.: Erkennung von Arbeitsgedächtnisbelastung und Zeitdruck in Dialogen - Empirie und Modellierung mit Bayes’schen Netzen. SFB 378 (READY), Memo Nr. 74, Mai 2001.

M78 Brandherm, B.: Rollup-Verfahren fur ¨ komplexe dynamische Bayes’sche Netze. SFB 378 (READY), Memo Nr. 78, Mai 2001. M79 Lohse, M.: Ein System zur Visualisierung von Navigationsauskunften ¨ auf stationären und mobilen Systemen. SFB 378 (REAL), Memo Nr. 79, Mai 2001. M80 Muller, ¨ C.: Symptome von Zeitdruck und kognitiver Belastung in gesprochener Sprache: eine experimentelle Untersuchung. SFB 378 (READY), Memo Nr. 80, Mai 2001. M81 Decker, B.: Implementation von Lernverfahren fur ¨ Bayes’sche Netze mit versteckten Variablen. SFB 378 (READY), Memo Nr. 81, Mai 2001. M82 Kruger, ¨ A., Malaka, R. (Eds.): Artificial Intelligence in Mobile Systems 2003 (AIMS 2003) SFB 378 (REAL), Memo Nr. 82, Oktober 2003. M83 Butz, A., Kray, C., Kruger, ¨ A., Schmidt, A. (Eds.): Workshop on Multi-User and Ubiquitous User Interfaces 2004 (MU3I 2004) SFB 378 (REAL), Memo Nr. 83, Januar 2004. M84 Baus, J., Kray, C., Porzel, R. (Eds.): Artificial Intelligence in Mobile Systems 2004 (AIMS 2004) SFB 378 (REAL), Memo Nr. 84, September 2004.