The Virtual Room Inhabitant - Intuitive Interaction With ... - CiteSeerX

1 downloads 0 Views 298KB Size Report
with Macromedia Flash MX3. These two components are connected via an XML-. Socket-Connection. The CE-server controls the Flash animation by sending ...
The Virtual Room Inhabitant - Intuitive Interaction With Intelligent Environments Michael Kruppa, L¨ ubomira Spassova, Michael Schmitz Saarland University Stuhlsatzenhausweg B. 36.1 66123 Saarbr¨ ucken, Germany {mkruppa, mira, schmitz}@cs.uni-sb.de

Abstract. In this paper we describe a new way to improve the usability of complex hardware setups in Intelligent Environments. By introducing a virtual character, we facilitate intuitive interaction with our Intelligent Environment. The character is capable of freely moving along the walls of the room. The character is aware of the users position and orientation within the room. In this way, it may offer situated assistance as well as unambiguous references to physical objects by means of combined gestures, speech and physical locomotion. We make use of a steerable projector and a spatial audio system, in order to position the character within the environment.

Keywords Intelligent Environments, Life-like Characters, Location-Aware Interaction

1

INTRODUCTION

Intelligent Environments physically combine several different input and output devices. These devices are spread all over the environment, and some may even be hidden in the environment. Furthermore, Intelligent Environments often offer a variety of different devices or combinations of devices capable of fulfilling a specific task. However, users in Intelligent Environments are often unaware of the different possibilities offered by the environment. Hence, in order to optimize the users benefit in the Intelligent Environment, it is necessary to find an interaction metaphor which pro-actively informs users about the technology hidden in the environment and services available to the user. The solution discussed in this paper uses a virtual character, capable of roaming on an arbitrary surface in an Intelligent Environment, in order to maximize the usability of the Intelligent Environment as a whole. As Towns et al.[1] have shown, virtual characters capable of performing judicious combinations of speech, locomotion and gestures are very effective in providing unambiguous, realtime advice in a virtual 3D environment. The goal of the project discussed in this paper, is to transfer the concept of deictic believability[1] of virtual characters in virtual 3D worlds to the

physical world, by allowing a virtual character to ”freely” move within physical space. The idea of a Virtual Room Inhabitant (VRI) is to allow the character to appear as an expert within the environment which is always available and aware of the state of each device. Our concept of a virtual character “living” within the Intelligent Environment, and thus playing the role of a host, allows both novice and advanced users to efficiently interact with the different devices and services integrated within the Intelligent Environment. The character is capable of welcoming a first time visitor and its main purpose is to explain the setup of the environment and to help users while interacting with the Intelligent Environment. Furthermore, the character is aware of the users position and orientation and is hence capable of positioning itself accordingly. In a next step, we will enable the character to suggest a number of devices to be used, in order to fulfill a certain task for the user. In case several users are interacting with the Intelligent Environment simultaneously, the character will also be able to support users in sharing devices, and in doing so, it will maximize the benefit of all users.

2

RELATED WORK

In the last few years, the number of projects dealing with Intelligent Environments has increased significantly. While many of these projects focused on the technical setup of these environments (see [2] and [3]), the demand for innovative, functional user interfaces for Intelligent Environments has constantly risen and thus became a central point within this research area. In [4], the authors stated their vision of an Intelligent Workspace, consisting of a network of agents, capable of organizing the combined usage of many different devices to maximize the benefit of all users. The proposed agent network allows the system to dynamically react on changing hardware setups within the environment (see [5]). The importance of integrating additional dimensions, like geographical information as well as the state and position of each user, within the information space of an Intelligent Environment is discussed in [6]. Knowing a user’s position and orientation may help to identify optimal devices or device combinations when performing a specific task within the Intelligent Environment. The importance and effectiveness of virtual characters capable of performing deictic believable gestures, in order to provide problem-solving advices, has been discussed in a number of publications (e.g. [7], [1]). These virtual characters have proven to decrease efficiently the ambiguity of references to virtual objects within the virtual environment. In our approach, we combine the idea of a network of agents with the geographical (i.e. positional) information derived from our outdoor/indoor navigation system [8]. Furthermore, we introduce a virtual character in our scenario, which is capable of giving unambiguous references to physical objects by means of speech, gestures and physical locomotion.

Fig. 1. The VRI besides a plasma-screen and the VRI system components

3

REALIZATION

In order to realize our vision of a life-like character ”living” in our Intelligent Environment, several software/hardware components were combined (see Figure 1). Each device has to be registered on our device manager as a service. The device manager, in combination with a presentation manager, grants access to all registered devices. In this way, we are able to share our devices between several applications running simultaneously. The remote access mechanism is realized with Java Remote Method Invocation1 objects, which allow arbitrary applications to control remote devices as if they were locally connected. We use two kinds of senders to detect user positions: Infrared beacons (IR beacons) and active Radio Frequency Identification tags (RFID tags). Most positioning systems equip the user with infrared batches or RFID tags and put respective sensors at the wall or ceiling to detect the presence and/or proximity of a user. In our approach the beacons and tags are mounted at the ceiling and the user carries a Personal Digital Assistant (PDA) with integrated sensors (the standard infrared port of the PDA and a PCMCIA active RFID reader card). A received (or sensed) tag or beacon indicates that a user is near the location of the respective sender2 (in case of the Infrared Beacons, which need a direct line of sight, the orientation of the user/PDA is also determined). The active RFID tags have a memory of 64 bytes of which 56 bytes can be freely used (read and write). We use this memory to store the coordinate of the tag. Each IR beacon is combined with one RFID tag that provides the coordinates of the beacon and the tag itself. When a tag or beacon is sensed by the PDA a Geo Referenced Dynamic Bayesian Network (geoDBN[9]) is instantiated (”induced”) and associated with the coordinate that is stored in its memory. 1 2

http://java.sun.com/products/jdk/rmi/ Or near a coordinate that lies in the area of the sender or receiver.

The position of the user is then calculated from the weighted combination of the coordinates at which geoDBNs exist: UserPost =

n X

α w(GeoDBN[i]) Coord(GeoDBN[i]).

i=1

n is the number of existing geoDBNs at time t (n ≥ NumberOfReceivedSenderst ), (Coord(GeoDBN[i])) is the coordinate and w(GeoDBN[i]) the weight of the ith geoDBN. α is a normalization factor that ensures that the sum of all weights multiplied with α is one. The calculated position is then forwarded by the PDA via wireless LAN to an Event Heap, a tuplespace infrastructure that provides convenient mechanisms for storing and retrieving collections of type-value fields [10]. On this Event Heap, we collect all kinds of information retrieved within the environment (i.e. user positions, interactions with the system). Our central component, the character engine, monitors the Event Heap and automatically reacts according to changing user positions. The Virtual Room Inhabitant implementation combines a character engine with a spatial audio system and a steerable projector, to allow the character to freely move within the room (i.e. move along the walls of the room). These three main parts of our implementation will be explained in detail in the following subsections. However, we will start with a short description of the setup of our Intelligent Environment. 3.1

The Instrumented Room

The testbed for our VRI implementation is an instrumented room with various types of devices. There is an instrumented desk, i.e. an ordinary table on which an interface is projected. The user is able to interact with the interface by means of a ceiling-mounted camera that is able to recognize simple gestures such as activating projected buttons. Another application in this environment is the Smart Shopping Assistant, consisting of a Smart Shelf and a Smart Shopping Cart. The assistant recognizes which products are taken out of the shelf or put into the shopping cart using RFID-technology and gives the user information about them or suggests recipes that can be cooked with the chosen products. A spatial audio system (SAFIR) and a steerable projector and camera unit (Fluid Beam) are integrated in the instrumented room and enable the realization of the VRI. Both systems are described in detail below in this paper. IR beacons and RFID tags are attached to the ceiling to allow for an accurate positioning service within the room. 3.2

Character Engine

The character engine consists of two parts, namely the character engine server (CE-server) written in Java and the character animation, which was realized

with Macromedia Flash MX3 . These two components are connected via an XMLSocket-Connection. The CE-server controls the Flash animation by sending XML commands/scripts. The Flash animation also uses the XML-Socket-Connection to send updates on the current state of the animation to the CE-server (i.e. whenever a part of an animation is started/finished). The character animation itself consists of ∼9000 still images rendered with Discreet 3D Studio Max4 which were transformed into Flash animations. To cope with the immense use of system memory, while running such a huge Flash animation, we divided the animation into 17 subparts. While the first consists of default and idle animations, the remaining sixteen are combinations of character gestures, like for example: shake, nod, look behind, etc. Each animation includes a lip movement loop, so that we are able to let the character talk in almost any position or while performing an arbitrary gesture. We have a toplevel movie to control these movie parts. Initially, we load the default movie (i.e. when we start the character engine). Whenever we have a demand for a certain gesture (or a sequence of gestures), the CE-server sends the corresponding XML script to the toplevel Flash movie which than sequentially loads the corresponding gesture movies. The following is a short example of an XML script for the character engine: gesture=LookFrontal sound=welcome1.mp3 gesture=Hips sound=welcome2.mp3 gesture=swirl sound=swirlsound gesture=LookFrontal sound=cart.mp3 gesture=PointDownLeft sound=cart2.mp3 gesture=swirl sound=swirlsound gesture=PointLeft sound=panel.mp3 gesture=swirl sound=swirlsound Each script part is enclosed by a script tag. After a script part was successfully performed by the VRI animation, the CE-server initiates the next step (i.e. moves the character to another physical location by moving the steerable projector and repositioning the voice of the character on the spatial audio system, or instructs the VRI animation to perform the next presentation part). In order to guarantee for a smooth character animation, we defined certain frames in the default animation as possible exit points. On these frames, the character is in exactly the same position as on each initial frame of the gesture animations. 3 4

http://www.macromedia.com/software/flash/ http://www4.discreet.com/3dsmax/

Fig. 2. System initialization steps for the VRI and the steerable projector

Each gesture animation also has an exit frame. As soon as this frame is reached, we unload the gesture animation, to free the memory, and instead continue with the default movie or we load another gesture movie, depending on the active XML script. In addition to its animation control function, the CE-server also requests appropriate devices from the presentation planner. Once access to these devices has been granted, the CE-server controls the spatial audio device, the steerable projector and the anti distortion software. The two devices, together with the anti distortion software are synchronized by commands generated by the CEserver, in order to allow the character to appear at any position along the walls of the instrumented room, and to allow the origin of the character’s voice to be exactly where the character’s visual representation is. Presentations within our Intelligent Environment are triggered by the user’s movements within our lab. Our users are equipped with a PDA which is used for an outdoor/indoor navigation task. The indoor positional information, retrieved by the PDA, is forwarded to our Event Heap. On this heap, we store/publish all kinds of information which might be important for services running within the Intelligent Environment. As soon as a user enters the instrumented room, the CE-server recognizes the according information on the Event Heap. In a next step, the CE-server requests access to the devices needed for the VRI. Given access to these devices is granted by the presentation manager (otherwise the server repeats the request after a while), the CE-server generates a virtual display on the anti distortion software and starts a screen capture stream, capturing the character animation, which is then mapped on the virtual display (details will be explained below). It also moves the steerable projector and the spatial audio source to an initial position.

As a final step, the CE-server sends an XML script to the character animation, which will result in a combination of several gestures, performed by the character while playing synchronized MP3 files (synthesized speech) over the spatial audio device. The whole initialization process is indicated in figure 2. 3.3

Steerable Projector and Camera Unit (Fluid Beam)

A device consisting of an LCD projector and a digital camera placed in a movable unit is used to visualize the virtual character (see figure 2). The movable unit (Moving Yoke) is mounted on the ceiling of the Instrumented Environment and can be rotated horizontally (pan) and vertically (tilt). In this way it is possible to project at any walls and desk surfaces in the Instrumented Environment. The digital camera can provide high resolution images or a low resolution video stream which are used to recognize optical markers or simple gestures. As the projection surfaces are usually not perpendicular to the projector beam, the resulting image would appear distorted. In order to correct this distortion we apply a method described in [11]. It is based on the fact that projection is a geometrical inversion of the process of taking a picture given that the camera and the projector have the same optical parameters and the same position and orientation (see figure 3). The implementation of this approach requires an exact 3D model of the Instrumented Environment, in which the projector is replaced by a virtual camera. Both the intrinsic (i.e. field of view) and the extrinsic (i.e. position and orientation) parameters of the virtual camera must match those of the given projector.

source image

camera

camera image/ projector image

projector

projection/ visible image

Fig. 3. Distortion correction using the virtual camera method

In this way we create a kind of virtual layer covering the surfaces of the Instrumented Environment on which virtual displays can be placed. This layer is visible only in the area that is currently illuminated by the projector beam. Thus the steerable projector serves as a kind of virtual torch making the virtual displays visible when it is moved in the appropriate direction. The displays are implemented as textured surfaces in the 3D model of the environment on which images, videos and live video streams can be displayed. The VRI is created as a live video stream texture on a virtual display. Thus it can be animated in real time by the character engine. By moving the virtual

display in the 3D model and an appropriate movement of the steerable projector the character appears to “hover” along the walls of the room. 3.4

Spatial Audio

The synthesized speech of the VRI generates audio output that could be played by any device with audio rendering capabilities. For such purposes we developed a Spatial Audio Framework for Instrumented Rooms (SAFIR, [12]) that runs as a service in our environment and allows applications to concurrently spatialize arbitrary sounds in our lab. The CE-server sends the generated MP3 files and the coordinates of the current location of the character to the spatial audio system, which positions the sounds accordingly. The anthropomorphic interface obviously appears more natural with the speech being perceived from the same direction as the projection is seen. This is particularly helpful in situations when other applications clutter up the acoustic space with additional audio sources at the same time: The spatial attributes of the audio output of the virtual character allow the user to associate the speech with the projection of the avatar more easily. Furthermore it automatically directs the user’s attention to the position of the character when it appears outside the user’s field of vision.

4

APPLICATION EXAMPLE

To test our VRI, we integrated it within a shopping and navigation demo running at our lab. In this scenario, users are given a PDA and perform a combined indoor/outdoor navigation task. The idea is, to lead the users to an airport ground and upon entering the airport facilities, to guide them towards certain duty free shops until the departure time is close. In our demonstration setup, these shops are represented by different rooms in our lab, one of them being the instrumented room. The VRI plays the role of a host, welcoming visitors and introducing them to the components of the instrumented room. In figure 1, the VRI is standing beside a display that is mounted on one of the walls. As soon as a user enters the room, the character appears on the wall nearby the entrance to welcome the user. Next step of the demonstration setup would be to interact with the smart shopping assistant mentioned in section 3.1. The demo involves the instrumented shelf, recognizing if products are removed, a shopping cart that also detects items placed into it, a tablet PC (mounted on the cart) presenting proactive shopping support to the user and a wall-mounted display showing additional product recommendations. As the user approaches the shelf, the VRI introduces these devices step by step and describes the interaction possibilities given to the user. While explaining the demo, the VRI moves along the walls, in order to appear next to the objects it is talking about. To conclude the shopping demo, the VRI is triggered once more to notify users about the immediate boarding of their flight. It appears alongside the exit of the room, points to it and instructs the user to proceed to the boarding gate.

5

CONCLUSIONS/FUTURE WORK

In summary, this paper has presented the design, implementation and application of the VRI, a new interaction metaphor for Intelligent Environments. By introducing a life-like character in our Intelligent Environment, we support users while interacting with various different devices and services. In a first step, the character is used to explain the technology in our instrumented room to new users. Since the character is capable of moving freely along the walls of the room, it may offer situated information to the users. By integrating sensor technology (i.e. PDAs, Infrared Beacons and RFID-Tags), the system may estimate the position of the user in the room and react accordingly. While in the first phase of the project, we concentrated on the technical realization of the VRI, in the second phase we will focus on the behavior and interactivity of the character. In a next step, we will integrate a speech recognizer which is capable of analyzing audio signals streamed from a PDA via wireless LAN to a server (based on the approach described in [13]). To adapt the character’s behavior to the user, we will integrate a combination of an interaction history and external user model. While the interaction history will allow the character engine to adapt the presentations by relating to previously presented information, the external user model (which will be available for specific services on the internet) will allow the system to adapt to general preferences shared by the user with the system (for example, a user might prefer to always use the headphones attached to his PDA when in a public situation, instead of a public audio system). To further improve the flexibility of the approach, we will also allow the character to migrate from the environment to the users PDA (this technology has already been used in a former project and is well established, see [14]). In this way, the character will be capable of presenting personalized information to the user, while others are in the same room. Additionally, the character on the PDA will be able to guide users to different locations, in situations (i.e locations), where the character is unable to move along the walls(as discussed in [15]). The VRI has been successfully tested during many different presentations and has been welcoming almost every visitor coming to our lab ever since it has been installed. We believe, that the VRI is a promising first step towards an intuitive interaction method for Intelligent Environments.

6

ACKNOWLEDGMENTS

We would like to thank Peter Rist, who designed the life-like character used in this project. We would also like to thank the FLUIDUM research group for providing the hardware used to realize the VRI project.

References 1. Towns, S.G., Voerman, J.L., Callaway, C.B., Lester, J.C.: Coherent gestures, locomotion, and speech in life-like pedagogical agents. In: IUI ’98: Proceedings of the 3rd international conference on Intelligent user interfaces, ACM Press (1998) 13–20 2. Coen, M.: Design principles for intelligent environments. In: AAAI/IAAI. (1998) 547–554 3. Kidd, C., Orr, R., Abowd, G., Atkeson, C., Essa, I., MacIntyre, B., Mynatt, E., Starner, T., Newstetter, W.: The aware home: A living laboratory for ubiquitous computing research. In: Cooperative Buildings. (1999) 191–198 4. Hanssens, N., Kulkarni, A., Tuchinda, R., Horton, T.: Building agent-based intelligent workspaces. In: Proceedings of ABA Conference. (2002) To Appear. 5. Coen, M., Phillips, B., Warshawsky, N., Weisman, L., Peters, S., Finin, P.: Meeting the computational needs of intelligent environments: The metaglue system. In: Proceedings of MANSE’99, Dublin, Ireland (1999) 6. Brumitt, B., Meyers, B., Krumm, J., Kern, A., Shafer, S.: Easyliving: Technologies for intelligent environments. In: HUC. (2000) 12–29 7. Lester, J.C., Voerman, J.L., Towns, S.G., Callaway, C.B.: Deictic believability: Coordinated gesture, locomotion, and speech in lifelike pedagogical agents. Applied Artificial Intelligence 13 (1999) 383–414 8. Baus, J., Kr¨ uger, A., Wahlster, W.: A Resource-Adaptive Mobile Navigation System. In: IUI2002: International Conference on Intelligent User Interfaces, New York, ACM Press (2002) 15–22 9. Brandherm, B., Schwartz, T.: Geo referenced dynamic Bayesian networks for user positioning on mobile systems. In: Submitted to the 2005 International Workshop on Location- and Context-Awareness. (2005) 10. Johanson, B., Fox, A.: The event heap: A coordination infrastructure for interactive workspaces. In: Proceedings of the Workshop on Mobile Computing Systems and Applications. (2002) 11. Pinhanez, C.: The everywhere displays projector: A device to create ubiquitous graphical interfaces. Lecture Notes in Computer Science (2001) 12. Schmitz, M.: Safir: Spatial audio framework for instrumented rooms. In: Proceedings of the Workshop on Intelligent and Transparent Interfaces, held at Advanced Visual Interfaces 2004, Gallipoli, Department of Computer Science, Saarland University, Germany (2004) 13. M¨ uller, C., Wittig, F.: Speech as a source for ubiquitous user modeling. In: Website of the Workshop on User Modelling for Ubiquitous Computing at UM 2003, ”http://www.di.uniba.it/∼ubium03/” (2003) 14. Kruppa, M., Kruger, A., Rocchi, C., Stock, O., Zancanaro, M.: Seamless personalized tv-like presentations on mobile and stationary devices in a museum. In: Proceedings of the 2003 International Cultural Heritage Informatics Meeting. (2003) 15. Kruppa, M.: Towards explicit physical object referencing. In: Proceedings of 10th international conferernce on User Modeling, Springer (2005) 506–508

Suggest Documents