The Weaved Reality: What Context-aware Interface ... - CiteSeerX

1 downloads 0 Views 681KB Size Report
Yamamoto and Tadashi Takumi for valuable discus- sion and for the implementation of the systems pro- vided. The Pfinder gesture recognition system used.
The Weaved Reality: What Context-aware Interface Agents Bring About Kenji Mase, Yasuyuki Sumi and Rieko Kadobayashi

ATR Media Integration & Communications Research Labs, Kyoto, Japan fmase, sumi, [email protected]

ABSTRACT

Weaved Reality is a novel concept ex-

tending so-called Augmented Reality and Mixed Reality. It is not just a presentation level mixture of computerized virtual material and real material. It essentially contains knowledge as media and context-aware interface agents as a device for facilitating communication between human. This paper presents the concept of the weaved reality and a few prototype systems toward its realization.

1. INTRODUCTION

Computers and computer networks are becoming an integral part of our life with the common availability of such features as for sending e-mail, chatting, and/or meeting over network. However, these activities are often isolated from the our real life, such as visiting places, family activities, elementary education, adult care, artistic work, etc. Networking makes our communication free from the limitation of time and location. However, what we've obtained within the advanced computerized technology is exclusions and negligence of temporal and spatial contexts of the real world, which is very informative and important for human to human communications. There have been many researches on Augmented Reality (AR) and the Mixed Reality (MR) paradigms. However, systems based on it again forget the situated contexts in the real world, or at best, it implies a xed context and extend the discussion under the limited situation. Most of the case, the discussions are given only on the representation level of geometrical con guration of the world, such as shapes, lighting, alignments and inter-relations between virtual and real objects. Weaved Reality is a novel concept extending AR and MR. It is not just a presentation level mixture of computerized virtual material and real material . It also contains knowledge as media and contextaware interface agents as a device for facilitating communication between peoples. The knowledge and the context-awareness become novel weaving thread of the reality of integrated systems of computerized materials and real materials. This paper presents the above mentioned concept of the weaved reality and a few prototype systems toward its realization.

2. WEAVED REALITY

The target of the weaved reality concept is the next generation information social systems. The local area networking is expanding to the wide area networking. The organization is becoming more dynamic. The objectives and bene ts of any activity are not centralized, rather distributed. The essence of activity changes from nding solutions to nding issues to be solved. The interests and volunteering mind en-power the activities, neither the duty nor the direct rewards. In these movements, we need to look into the future of our group activities in terms of the global community computing[1]. We have been duplicating the real-world in virtual reality systems that uses various input devices such as keyboard, mouse and joystick and the output image on screen to connect the both worlds. These devices are situation free and thus the system provides the same interactions and contents no matter where the system is used and no matter who is using the system. It is an important aspect of the generality of such systems. However, it is neglecting the valuable information about the situated context such as \Where we are now?" and \What are we?" The Perceptual User Interface (PUI) is considered possible to provide such contextual information. Not only visual information, but also auditory, speech, tactile, location, temperature etc. information should be utilized for context sensing and for the weaved reality information systems.

3. C-MAP

The C-MAP (Context-aware Mobile Assistant Project)[2] is an attempt to build a tour guidance system that provides information to visitors at exhibitions based on their locations, responses and individual interests. Our long-term goal is to investigate future computer-augmented environments that enhance communications and information sharing between people and knowledgeable machines situated in the real-world. In order to investigate how to create such a knowledge medium, we have chosen museums and showcase exhibitions such as open-house and trade shows. These are places where knowledge is accumulated and/or conveyed, and where specialist exhibitors provide knowledge to visitors with diverse interests and viewpoints. It is also applicable

Figure 1: C-MAP at a exhibition site toward the digital cities. The C-MAP system principally consists of servers providing exhibit-related information and guide information, portable PCs(see gure 1) or pocketable personal digital assistants (PDAs, see gure 2) connected with the servers by a wireless LAN or IR communicators. The servers are centralized data storage and/or Kiosk terminal for information retrieval related to the individual exhibits. Some users such as exhibitors and intensive visitors wear the active badge ( gure 1) for the realtime tracking of user locations. This sensor can be replaced by the omni-directional computer vision systems. The IR connections with the kiosks in the exhibition will leave the traces of visited places explicitly. The implicit and explicit traces of visit are very important and the key technologies to realize the functional weaved reality systems. The implicit trace realizes the easy and comfortable interactive environment for users with non-interactive visit records. The explicit trace requests users to act intentionally, however it can collect emotional states or explicit thoughts such as strong interests against the special exhibit. We have not investigated yet, but the automatic face and gesture recognition technology can be utilized well in this scenario.

3.1 C-MAP Mobile Assistant

We prototyped mobile assistants and have been testing two-day open-house exhibitions held by our research laboratory. The following is the hardware architecture of the system. We use Windows PCs either with pen-based interface or general keyboards. To connect these PCs to the servers, we use a 1.2GHz radio wireless LAN. The Web server is used as a server of Java applets for the mobile assistant, and as a server of Web pages related to the exhibits. The Active Badge System server collects visitors location from many sensors installed at each exhibit sites. The sensors can detect badges within a 1 to 2 meter perimeter1 . The agent server provides guidance, such as tour planning and exhibit recommendation, by monitoring the ABS information and each user's interaction with the system on the portable PCs. The 1

The new ELPAS system can detect longer range.

Figure 2: C-MAP Palmguide guide agent for each portable PC, which runs on the agent server, processes the personalized guide according to the user's context and displays the result on the portable PCs. The server and the portable PCs connect via the LAN, which further connects with the Internet, and is therefore open to the outside. Currently, we are developing a general architecture to personalize exhibit displays according to a user's context by using palm-size PDAs such as 3Com's Palm III(see gure 2). The PDA can communicate, by an infrared communicator, with exhibit displays which are ubiquitously located as kiosks in the exhibition site. It keeps its user's personal pro le and touring records, which are conveyed to the exhibit displays when the user turns the PDA to these. The individual exhibit displays are then personalized for presentation based on the user's context. This architecture is more practical than the rst prototype which were using heavier portable PCs. The new system facilitates the protection of the user's privacy and the maintenance of the exhibitor's preparation because the individual user data and exhibit-information can be managed in a distributed manner. Note that since most exhibits demonstrate computer applications at our open house and the hightech museums, the exhibit applications are able to share information with the mobile assistant servers by the LAN. Consequently, for example, exhibitors were able to provide highly personalized demonstrations by using the personal data (e.g., personal interests, touring histories, pro les) accumulated in the guide agent server. The next sec-

tion illustrates an instance of the interface agent that personalizes the exhibit for individual visitor.

4. INTERFACE AGENT IN VisTA-walk

The C-MAP assistant collects the personal data such as the visit log in the personal mobile computer agent system. This makes it possible for the exhibit to be tailored for a particular user based on the data. In this section, we show an example of such personalization in a computerized exhibit that \exploits the accumulated real-world context for presentation in the virtual space[3]." Museums are great archives of society, and they display wonderful natural phenomena, historical artifacts, artistic masterpieces, and human knowledge for visitors to come and see. We are investigating the presentation styles of future museums and have proposed the Meta-Museum concept[4]. Meta-Museum aims to provide a communication environment between experts, e.g., researchers and curators, and non-experts, e.g., visitors, by using seamless integrated presentations of virtual objects and exhibits. One of the issues confronting realization of such a Meta-Museum is how to provide easy-to-use interfaces for exploring a virtual world and integrated presentations. In the course of this research, we have developed a gesture interface of the VisTA-Walk system, which replaces mouse interactions by full body gesture interactions, e.g., walking-through a virtual space and accessing reference information about objects[5]. We use the p nder[6] program module as a perceptual (gestural) user interface library and integrate it with VisTA. Events from the p nder program are translated as mouse events and fed to the VisTA system. In the following subsections, we rst present the VisTA-WALK system, then explain its gesture interface, and nally demonstrate the interface agent that guides the VisTA-WALK world based on the real-world context.

4.1 VisTA-Walk

VisTA simulates the temporal transition of an-

cient villages for archaeological researchers and provides users with a walk-through viewer. Web pages such as vestige site excavation records are hyper-linked from a vestige database shown with a viewer. Figure 3 illustrates the VisTA-Walk system and shows an interaction scene. P nder is the gesture recognition program and uses one video camera mounted on top of a screen. We have set the camera on top of a large screen (170 inches), and use several of the p nder outputs, e.g., standing position, crouching position, and stretching of arms (left and/or right), which are translated into mouse events. VisTA-Walk contains models of the land of a village, its vestiges and buildings, and the hypothetical lifetime of each building. The models are written in and controlled under Open Inventor. The database keeps reference records about the vestiges in the form of a HTML. A reference is displayed on the browser if a building is selected in the main viewer.

4.2 Gesture Interface of VisTA-Walk

The following are major interactions of VisTA: (i) walking-through a virtual space for exploration, and (ii) pointing at an object in a virtual space to get reference records. It is acceptable for a user to use di erent parts of the body for these two independent interactions. Based on this principle, a plausible example of interaction design would be that a standing position controls walking-through and hand gestures control pointing. How can we assign a standing position to the control events of walking-through? There are at least three choices of metaphor for control: mouse, joystick, or steering wheel and accelerator. The positioning action of a mouse is that of a locatortype device, while a joystick and a steering wheel are valuator-type devices. We employ a steering wheel-like control scheme for gesture interaction in the VisTA-Walk system, while providing mouse control for desktop interaction. Similar to the movement of a steering wheel, stepping aside from the neutral position is used to steer the direction of walking. Stepping forward or backward is assigned to an accelerator with which a user moves the viewing position forward or backward in the virtual environment. In order to stop at a desired point in a scene, a user must return to the neutral position. The speed of steering and walking is proportional to the stepping distance from the neutral position, and the ratio of speed to distance a ects the assessment of usability. If the speed is too fast, a user can quickly arrive within the proximity of a destination point, but it is hard to stop exactly at the desired point. On the other hand, if the speed is too slow, a user may step far from the neutral point to quickly arrive at a point; stopping at the desired point necessitates moving back by several steps, and this causes inaccuracy. However, the usability test tells that users will enjoy traveling in a slow speed and arrive to the desired point without diculty. Walking-through is realized by means of controlling the viewing position of a virtual 3-D space with a virtual camera. A 2-D position on the terrain surface is necessary and sucient when we only simulate walking actions. However, changes in the height of the viewing position and the viewing elevation facilitate immersive sensations for a user. The current interpretation of pointing gestures are a \left one" or a \right one". The VisTA-Walk gesture interpreter uses the output recognition results of P nder, e.g., \stretch left/right hand" or \raise left/right hand". The interpreter chooses an object on the left (right) side when it detects a \left (right) one" request from P nder outputs. The pointing actions recognized by P nder are insucient to locate a particular object among many, and this is due to the camera position. Pointing and stretching a hand toward the screen is hard to detect because the camera is on top of the screen. Stereoscopic multiple camera arrangement is necessary to get complete 3-D information about hand and body gestures[7], and such an ar-

Pfinder

Event Handler HTML Reference documents

VisTA Model Data (inventor)

WWW Browser

Viewer

Figure 3: A scene in the VisTA-Walk system and the system block diagram: gesture-based walk-though and information access in a virtual ancient village. A TV camera is set on top of the screen.

Figure 4: VisTa-walk: A Gesture Interface Virtual Walkthrough with an Interface Agent rangement is presently unrealistic, because many cameras would be necessary to cover the complete area of a user in motion. Ambiguity is not the problem at all because a user is situated in the virtual space and is able to move around. Suppose a user wants to indicate a particular object from among many at a distance. She only needs to come closer to the object and then stretch her hand toward the object. Moreover, this is the case when no distinctive feature other than the location is available. We will integrate other media such as voice to complement a situation such as saying \I want that house on the left with the red roof."

4.3 Context-aware Interface Agent

A visitor to the personalized exhibits observes the demonstrations as his personal guide agent which is running on a palmtop computer, providing information about demonstrations and suggests which one to visit next. This personal guide agent also appear in a seamless manner when the visitor comes to a demonstration in VisTA-walk. Here, we use the word \seamless" in the sense that a visitor does not need to be aware which world he is currently visiting and also does not need to invoke his personal guide agent should always be attentive to the visitor without explicit indication by the visitor. Seamless guidance allows visitors

to devote their attention to the primary tasks, i.e., seeing, thinks about, and understanding the contents of exhibits. A typical scenario for seamless guidance in the VisTA-walk computerized demonstration is as follows: 1. A visitor visits many exhibits and demonstrations, and arrives the VisTA-walk site. 2. Once the sensor identi es the visitor in the area of the VisTA-walk system, the personal guide agent automatically transfers from the PDA and appears in the virtual village. 3. The personal guide agent starts guiding the visitor. The guidance mode depends on how she or he has interacted with the past exhibits, e.g. the spent time, the information access frequency, and the ratings about individual exhibits. Based on these interaction measures, the system computes and decides the activation level of the user in order to choose one of the guidance mode. There are at least three major guidance modes; (i) agent takes whole control and initiative of navigation based on its plan, (ii) agent allows the user to take control of navigation, but still suggests course trying to keep the planned course, and (iii) agent has no control, but follows the navigation by the user giving the guidance information. The mode (i) is suitable for passive users as they do not need act anything but just to watch the given presentations. The mode (iii) is welcome by very active users as it gives whole control to the user and user can explore the virtual world as much as time allows and interests continue. The autonomous behavior of agent in the mode (ii) is very interesting and has a natural social interaction to the user. The planning of the course may depend on the time constraint of visitor alloted to the place. Then it is possible to have a variation of duration of the tour in the various mode choice as well.

5. CONCLUSION

We present the concept of Weaved Reality and a few prototype systems based on it. It is a technical concept of how to integrate the existing AR and MR technologies with \AI" technologies such as

knowledge management, context computing, and interface agents. It is notable that it beyond a presentation level mixture of computerized virtual material and real material. The computer vision has a huge potential to provide the contextual information about the users. The perceptual user interface is a post-WIMP interface and the key to the unburden interactions with interface agents[8]. The intelligence of computer is getting more important in the information age. We believe that the computer equipped in the real world can facilitate the human to human communication and the human to computer communication better, when it knows and understands the context and the situation of the users. For instance, it will be more dicult for us to access and control the information database when the Internet expands more. It should be presented for individuals personally without a lot of ltering e orts. Computer vision is of course a promising technology for it. We need to investigate more on sensing techniques of people in order to obtain rich contextual and semantic information. The integration with the knowledge and context computing will lead us, we believe, to the enriched real life powered by the weaved reality.

ACKNOWLEDGEMENTS

The authors would like thank to Yasuyoshi Sakai and Ryohei Nakatsu, for their support and encouragement in this research. The authors would like to thanks Sidney Fels, Tameyuki Etani, Kaoru Sumi, Kazushi Nishimoto, Nicolas Simonet, Keiko Nakao, Tetsushi Yamamoto and Tadashi Takumi for valuable discussion and for the implementation of the systems provided. The P nder gesture recognition system used in the VisTA-Walk was provided by the Perceptual Computing Section, Media Laboratory, MIT.

References

[1] Kenji Mase, Yasuyuki Sumi, and Kazushi Nishimoto, \Informal conversation environment for collaborative concept formation", In Toru Ishida, editor, Community Computing: Collaboration over Global Information Networks, pp. 165{205, John Wiley & Sons, 1998. [2] Yasuyuki Sumi, Tameyuki Etani, Sidney Fels, Nicholas Simonet, Kaoru Kobayashi, and Kenji Mase, \C-map: Building a context-aware mobile assistant for exhibition tours", In Toru Ishida, editor, Community Computing and Support Systems, LNCS 1519, pp. 138{155, Springer, Nov 1998. [3] Rieko Kadobayashi and Kenji Mase, \Seamless cuidance by personal agent in virtual space based on user interaction in real world", In The Third International Conference on The Practical Application of Intelligent Agents and Multi-Agent Technology (PAAM'98), pp. 191{200, London, March 1998. [4] Kenji Mase, Rieko Kadobayashi, and Ryohei Nakatsu, \Meta-museum: A supportive augmented reality environment for knowledge sharing", In Int'l Conf on Virtual Systems and Multimedia '96, pp. 107{110, Gifu, Japan, Sept. 1996. [5] Rieko Kadobayashi, Kazushi Nishimoto, and Kenji Mase, \Design and Evaluation of Gesture Interface of an Immersive Walk-through Application for Exploring Cyberspace", In Proc. of The Third

International Conference on Automatic Face and Gesture Recognition (FG'98), pp. 534{539, Nara, Japan, April 1998. [6] Cris Wren, Azi Azarbayejani, Trevor Darrell, and Alex Pentland, \P nder: Real-Time Tracking of the Human Body", In 2nd International Conf. on Automatic Face and Gesture Recognition, Killington, Vermont, Oct. 1996. [7] Masaaki Fukumoto, Kenji Mase, and Yasuhito Suenaga, \Finger-pointer: Pointing interface by image processing", Comput. & Graphics., 18, 5, pp. 633{642, May 1994. [8] Kenji Mase, \Human Reader: A Vision-based Man-machine Interface", In Robert Cipolla and Alex Pentland, editors, Computer Vision for Human-Machine Interaction, pp. 53{81, Cambridge University Press, 1998.

Suggest Documents