Mobile Reality: A PDA-Based Multimodal Framework Synchronizing a Hybrid Tracking Solution with 3D Graphics and Location-Sensitive Speech Interaction Stuart Goose1 , Heiko Wanning2 *, Georg Schneider3 * 1
Multimedia Technology Department, Siemens Corporate Research, Inc. 755 College Road East Princeton, NJ 08540, USA +1 609 734 6500
[email protected] 2
Computer Science Department, University of Saarlandes, Postfach 15 11 50, 66041 Saarbrücken, Germany +49 (0681) 302-3418
[email protected] 3
Computer Science Department, Fachhochschule Trier, Standort Schneidershof, D-54293 Trier, Germany +49 651 8103 580
[email protected]
Abstract. A maintenance engineer who talks to pumps and pipes may not seem like the ideal person to entrust with keeping a factory running smoothly, but we hope that our Mobile Reality framework will enable such behavior in the future to be anything but suspicious! Described in this paper is how the Mobile Reality framework, running entirely on a Pocket PC, synchronizes a hybrid tracking solution to offer the user a seamless, location-dependent, mobile multimodal interface. The user interface juxtaposes a three-dimensional graphical view with a context -sensitive speech dialog centered upon objects located in the immediate vicinity of the mobile user. In addition, support for collaboration enables shared VRML browsing with annotation and a full-duplex voice channel.
1. Introduction and Motivation In recent years we have witnessed the remarkable commercial success of small screen devices, such as cellular phones and Personal Digital Assistants (PDAs). Inexorable growth for mobile computing devices and wireless communication has been predicted *
This research was conducted while working at Siemens Corporate Research, Inc.
by recent market studies. Technology continues to evolve, allowing an increasingly peripatetic society to remain connected without any reliance upon wires. As a consequence, mobile computing is a growth area and the focus of much energy. Mobile computing heralds exciting new applications and services for information access, communication and collaboration across a diverse range of environments. Keyboards remain the most popular input device for desktop computers. However, performing input efficiently on a small mobile device is more challenging. This need continues to motivate innovators. Speech interaction on mobile devices has gained in currency over recent years, to the point now where a significant proportion of mobile include some form of speech recognition. The value proposition for speech interaction is clear: it is the most natural human modality, can be performed while mobile and is hands-free. Although virtual reality tools are used for a multitude of purposes across a number of diverse markets it has yet to become widely deployed and used in mainstream computing. The ability to model real world environments and augment them with animations and interactivity has benefits over conventional interfaces. However, navigation and manipulation in 3D graphical environments can be difficult, and disorientating, especially when using a conventional mouse. A panoply of very small and inexpensive sensors, suitable for integration within mobile devices, are becoming increasingly available [24]. These sensors can be used to report various data about the surrounding environment and relative movement, etc. The hypothesis that motivated this research is that inexpensive sensors could be exploited to provide continual location information, which in turn, could seamlessly and automatically drive the navigation through a VRML scene of the real world. In addition to eradicating the complexity of 3D navigation, integrating context -sensitive speech interaction would further simplify and enrich the interface. Siemens is the world's largest supplier of products, systems, solutions and services in the industrial and building technology sectors. In order to help maintain this leading position, at Siemens Corporate Research one future trend that we have been focusing on is applying 3D interaction and visualization techniques to the industrial automation domain. Service and maintenance is by necessity a peripatetic activity, and as such one continuing aspect of our research focuses upon improving automated support for this task. The research on the Mobile Reality framework reported in this paper was a project to evaluate current commercially available wireless mobile devices and their suitability for providing a rich multimodal user interface and supporting relatively demanding interaction requirements. A functional overview of Mobile Reality can be seen in figure 1. To the knowledge of the authors, this is the first reported solution running on a regular commercially available PDA that synchronizes hybrid tracking input with a multimodal user interface to drive the navigation of a VRML scene. In addition, Mobile Reality offers location-sensitive speech interaction and mobile collaboration support. A survey of the related work is discussed in section 2. Section 3 presents an exemplar application scenario. A detailed description of the system architecture is offered in section 4. Section 5 proposes areas for further research, and some concluding remarks are provided in section 6.
Figure 1: A functional overview of Mobile Reality.
2. Related Work Situated computing [10] considers environmental factors such as such as user location, identity, profile, and seeks to provide techniques for developing situation, context -aware and intelligent mobile computing systems. Such systems are able to process and to interpret this information as well as to react to it. Consequently, situated computing pursues concrete application scenarios to provide new classes of user-centric mobile application and services with more personal and appropriate behavior. The Active Badge System [29] facilitates position tracking of people wearing badges in an office environment and, for example, to route phone calls to the telephone which is located closest to the person. The system Memoclip [3] aims at providing users with location-based messages. When a user approaches a sensor that is physically associated with a reminder the Memoclip displays the corresponding message. Alternative tracking and localization solutions for mobile computing have been reported that make extensive use of computer vision-based algorithms for the detection of unique visual markers [22, 31]. Some vision-based approaches can
deduce accurate distance and orientation information. In contrast to these approaches, Mobile Reality uses a hybrid tracking solution that fuses the input from infrared beacons and a three degrees-of-freedom (3 DOF) inertia tracker. These are affordable technologies that could be integrated with the current generation of PDAs without consuming excessive processor cycles, unlike vision-based techniques. Context -awareness is a further focus of ubiquitous systems. The Cyberguide system [2] provides mobile handheld tour guides of a university campus. Knowledge of physical user location at the present time and in the past, which is identified as context information, is used to provide a guidance system that aims to simulate real guide tours. The Medicine Cabinet [28] is a piece of intelligent bathroom furniture imbued with speech capabilities for offering personalized health related advice. ComMotion [16] is a mobile location-aware system for linking personal information to locations in a user’s life. Both a GUI and a speech interface are available. ComMotion differs from Mobile Reality in that it runs on portable PC and uses GPS for location detection, while the Medical Cabinet was not designed to be mobile. The benefits of mobile maintenance [26] and virtual environments [4] to the industrial sector have been reported. Nilsson et al [20] describe a handheld custom device, called the Pucketizer, designed to assist maintenance engineers on the factory floor. The authors found little evidence of prior work conducted into the integration of speech technology within virtual environments. Our motivation for integrating speech technology was to enable hands-free interaction - useful for a proportion of maintenance tasks - with the 3D environment and leverage the potential benefits of a multi-modal interface [8]. VRML browsers have no native support for speech technology, although most support playing static digital audio files in 3D with a few supporting streamed audio. Ressler et al [23] describes a desktop solution for integrating speech synthesis output within VRML. In comparison with Mobile Reality their solution is not mobile, there is no means for parameterizing the speech output, and they do not consider at all the integration of speech recognition. Mynatt et al [18] describe a system called Audio Aura for providing office workers with rich auditory cues (via wireless head-phones) within the context of VRML that describes the current state of the physical objects that interest them. Although Audio Aura and Mobile Reality have their roots in VRML, they differ in many ways. Mobile Reality supports speech in and out for dialog, whereas Audio Aura is concerned with generating a continuous audio output that, by defining ecologies, is a metaphor for the local environment. Navigation and manipulation in desktop 3D graphical environments can be difficult, unnatural and confusing, especially when using a conventional mouse. This need then spawned research into novel input and control devices for this purpose [30]. Fitzmaurice et al [6] in 1993 simulated a palmtop computer to, among other things, evaluate how novel input devices can expedite interaction in virtual environments on handheld devices. Hinckley et al [9] describes how a Cassiopeia was augmented with multiple sensors providing environmental data allowing inferences about the current context to be drawn. It was shown how the fusion of the context data could be exploited to offer adaptive and more intuitive interaction with mobile device. Examples include automatic power on/off, automatic landscape/portrait flipping, automatic application invocation, etc.
Lewis et al [19] describe an AR system whereby the user wears a positioning device, which calculates time intervals for ultrasonic pulses to reach receivers in fixed positions. Using an iPAQ running Linux, the X-Windows display of a remote server application is redirected to the iPAQ display allowing the user to view the user interface. Mobile Reality by contrast runs on the iPAQ, has a multimodal interface including speech and uses a different tracking technology. Analysis of three case studies is reported by Luff et al [15], in which he describes the need for specific types of mobile collaboration support. Mobile ad-hoc collaboration is also emerging as a research area. Collaboration support in Mobile Reality is currently strictly peer-to-peer and limited to two parties, however it offers a few novel features.
3. Exemplar Application Scenario In this section, an exemplar application is presented that makes use of much of the Mobile Reality functionality. The application is concerned with mobile maintenance. The 2D floor plan of our office building can be seen in figure 2(a). It has been augmented to illustrate the positions of six infrared beacons (labeled IR1 to IR5) and their red coverage zones, and the proximity sensor regions (labeled PS1 to PS6) in yellow (the technology behind the proximity sensors are explained in section 4). The corresponding VRML viewpoint for each infrared beacon can be appreciated in figure 2(b).
3.1.
Mobile Maintenance
The mobile maintenance technician arrives to fix a defective printer. He enters the building and when standing in the intersection of IR1 and PS1 (in figure 2) turns on his PDA and starts Mobile Reality. The Mobile Reality framework detects IR1 and loads the corresponding VRML scene, and, as he is standing in PS1, the system informs him of his current location. The technician does not know the precise location of the defective printer so he establishes a collaborative session with a colleague, who guides him along the correct corridor using the 3D co-browsing feature. While enroute they discuss the potential problems over the voice channel. When the printer is in view they terminate the session. The technician enters PS6 as he approaches the printer, and the system announces that there is a printer in the vicinity called Dino. A context -sensitive speech bubble appears on his display listing the available speech commands. The technician issues a few of the available speech commands that Mobile Reality translates into diagnostic tests on the printer, the parameterized results of which are then verbalized by the system. If further assistance is necessary, he can establish another 3D co-browsing session with a second level of technical support in which they can collaborate by speech and annotation on the 3D printer object. If the object is complex enough to support animation, then it may be possible to collaboratively explode the printer into its constituent parts during the diagnostic process.
(a)
(b)
Figure 2: The floorplan is augmented in (a) to illustrate the proximity sensor regions in yellow and the red infrared beacon coverage zones in red and in (b) to show the corresponding VRML viewpoint for each coverage zone.
3.2.
Personalization Through Augmentation of Speech Interaction
Mixed and augmented reality techniques have focused on overlaying synthesized text or graphics onto a view of the real world, static real images or 3D scenes. The Mobile Reality framework now adds another dimension to augmentation. As the speech interaction is modeled separately from the VRML and specified in external XML resources, it is now easily possible to augment the 3D scene and personalize the interaction in terms of speech. Using this approach, the same 3D scene of the floor plan can be personalized in terms of speech interaction for a maintenance technician, electrician, HVAC technician, office worker, etc.
4. System Architecture Mobile Reality does not have a distributed client/server architecture, but instead the framework runs entirely on a regular 64Mb Compaq iPAQ equipped with wireless LAN access and running the Microsoft Pocket PC operating system. As can be appreciated from figure 3, the Mobile Reality framework comprises four main components responsible for hybrid tracking, 3D graphics management, speech interaction and collaboration support. Each of these components are described in the following subsections.
Figure 3: High-level architecture of Mobile Reality. 4.1.
Hybrid Tracking Solution
As alluded to earlier, one aim of the system was to provide an intuitive multimodal interface that facilitates a natural, one-handed navigation of the virtual environment. Hence, as the user moves around in the physical world their location and orientation is tracked and the camera position in the 3D scene is adjusted correspondingly to reflect the movements. While a number of tracking technologies have been proposed, Klinker et al [12] recognizes that the most successful indoor tracking solutions will comprise two or
more tracking technologies to create a holistic sensing infrastructure able to exploit the strengths of each technology. We subscribe also to this philosophy. As can be seen in figure 4, two affordable technologies were selected that could be integrated with the current generation of PDAs without consuming excessive processor cycles. Infrared beacons able to transmit a unique identifier [5] over a distance of approximately 8 meters provide coarse-grained tracking, while a three degrees-of-freedom (3 DOF) inertia tracker from a head-mounted display provides fine-grained tracking. Hence, a component was developed that manages and abstracts this hybrid tracking solution and exposes a uniform interface to the framework. An XML resource is read by the hybrid tracking component that relates each unique infrared beacon identifier to a three-dimensional viewpoint in a specified VRML scene. The infrared beacons transmit their unique identifiers twice every second. When the component reads a beacon identifier from the IR port it is interpreted in one of the following ways: ?? Known beacon: If not already loaded, the 3D graphics management component loads a specific VRML scene and sets the camera position to the corresponding viewpoint. ?? Unknown beacon: No mapping is defined in the XML resource for the beacon identifier encountered. The 3 DOF inertia tracker is connected via the serial/USB port to the PDA. Every 100ms the hybrid tracking component polls the tracker to read the values of the pitch (x-axis) and yaw (y-axis). Again, depending upon the values received, the data is interpreted in one of the following ways: ?? Yaw-value: The camera position in the 3D scene is adjusted accordingly. A tolerance of ±5 degrees was introduced to mitigate excessive jitter. ?? Pitch-value: A negative value moves the camera position in the 3D scene forwards, while a positive value moves the camera position backwards. The movement forwards or backwards in the scene is commensurate with the depth of the tilt of the tracker. One characteristic of the inertia tracker is that over time it drifts out of calibration. This effect of drift is somewhat mitigated if the user moves periodically between beacons. Using the inertia tracker in its current form is not terribly practical, but in the near future it is entirely reasonable for such a chipset to be incorporated into a PDA. The hybrid tracking component continually combines the inputs from the two sources to calculate and maintain the current position and orientation of user. The Mobile Reality framework is notified as changes occur, but how this location information is exploited is described in the following subsections. The user can always disable the hybrid tracking component by unchecking the tracking checkbox on the user interface. In addition, at any time the user can override and manually navigate the 3D scene by using either the stylus or the five-way joystick.
(a)
(b)
Figure 4: A hybrid solution comprising (a) infrared beacons for coarsegrained tracking and (b) an inertia tracker from a head-mounted display for fine-grained tracking.
4.2.
3D Graphics Management
One important element of the mobile multimodal interface is that of the 3D graphics. The 3D graphics management component of the framework relies heavily upon a third party VRML component [21] for this functionality. The VRML component has an extensive programmable interface. Hence, as the hybrid tracking component issues a notification that the user’s position has changed, the 3D graphics management component interacts with the VRML component to adjust the camera position and maintain real-time synchronization between them. The ability to offer location and context -sensitive speech interaction was a key aim of this work. The approach selected was to exploit the VRML element called a proximity sensor. Proximity sensor elements are used to construct one or more invisible cubes that envelope any arbitrarily complex 3D objects in the scene that are to be speech-enabled. When the user is tracked entering one of these demarcated volumes in the physical world, which is subsequently mapped into the VRML view on the PDA, the VRML component issues a notification to indicate that proximity sensor 123 has been entered. A symmetrical notification is also issued when a proximity sensor is left. The 3D graphics management component forwards these notifications and hence enables proactive location-specific actions to be taken by the Mobile Reality framework.
4.3.
Speech Interaction Management
As alluded to earlier, no intrinsic support for speech technologies are present within the VRML standard, hence a speech interaction management component was developed to fulfill this requirement. The speech interaction management comp onent integrates and abstracts the ScanSoft RealSpeak TTS engine and the Siemens ICM Speech Recognition Engine. As mentioned before, the 3D virtual counterparts of the
physical objects nominated to be speech-enabled are demarcated using proximity sensors. An XML resource is read by the speech interaction manager that relates each unique proximity sensor identifier to a speech dialog specification. This additional XML information specifies the speech recognition grammars and the corresponding parameterized text string replies to be spoken. For example, when a maintenance engineer approaches a container tank he or she could enquire, “Current status?” To which the container tank might reply, “34% full of water at a temperature of 62 degrees Celsius.” Hence, if available, the Mobile Reality framework could obtain the values of “34”, “water” and “62” and populate the reply string before sending it to the TTS engine to be spoken. Anecdotal experience gleaned from colleagues at Philips speech research indicated that when users are confronted with a speech recognition system and are not aware of the permitted vocabulary, they tend to avoid using the system. To circumvent this situation, when a user enters the proximity sensor for a given 3D object the available speech commands can either be announced to the user, displayed on a “pop-up” transparent speech bubble sign (as shown in figure 5), or even both. When the speech interaction management component receives a notification that a proximity sensor has been entered it extracts from the XML resource the valid speech grammar commands associated with that specific proximity sensor. A VRML text node can then dynamically generated containing the valid speech commands and displayed to the user. When the speech interaction management component receives a notification that the proximity sensor has been left, the speech bubble is destroyed.
Figure 5: “Pop-up” speech bubbles indicate that context sensitive speech interaction is available at the current location. The current implementation of the speech bubbles makes no attempt to follow the user’s orientation. In addition, if the user approaches the speech bubble from the “wrong” direction, the text is unreadable as it is in reverse. The appropriate use of a VRML signposting element should address these current limitations. When the speech recognition was initially integrated, the engine was configured to listen for valid input indefinitely upon entry into speech-enabled proximity sensor.
However, this consumed too many processor cycles and severely impeded the VRML rendering. The solution chosen now requires the user to press the record button on the side of the iPAQ prior to issuing a voice command. It is feasible for two overlapping 3D objects in the scene, and by extension the proximity sensors that enclose them, to contain one or more identical valid speech grammar commands. This raises the problem of to which 3D object should the command be directed? Our solution is to detect automatically the speech command collision and resolve the ambiguity by querying the user further as to which 3D object the command should be applied.
4.4.
Mobile Collaboration Support
At any moment, the user can issue a speech command to open a collaborative session with a remote party. In support of mobile collaboration, the Mobile Reality framework offers three features: ?? A shared 3D co-browsing session ?? Annotation support ?? Full-duplex voice-over-IP channel for spoken communication A shared 3D co-browsing session enables the following functionality. As the initiating user navigates through the 3D scene on her PDA, the remote user can also simultaneously experience the same view of the navigation on his device– with the exception of network latency. This is accomplished by capturing the coordinates of the camera position during the navigation and sending them over the network to the remote system. The remote system receives the coordinates and adjusts the camera position accordingly. A simple TCP sockets-based protocol was implemented to support shared 3D co-browsing. The protocol includes: ?? Initiate: When activated, the collaboration support component prompts the user to enter the network address of the remote party, and then attempts to connect. contact remote party to request a collaborative 3D browsing session. ?? Accept/Decline: Reply to the initiating party either to accept or decline the invitation. If accepted, a peer-to-peer collaborative session is established between the two parties. The same VRML file is loaded by the accepting PDA. ?? Passive : The initiator of the collaborative 3D browsing session is by default assigned control of the session. At any stage during the cobrowsing session, the person in control can select to become passive. This has the effect of passing control to the other party. ?? Hang-up: Either party can terminate the co-browsing session at any time. Although incomplete, the prototype implementation is being extended to support shared dynamic annotation of the VRML scene using colored ink. A preliminary version of this can be seen in figure 6. A variety of techniques are being explored for maintaining the synchronicity and scale of the annotation with respect to the scene as camera position changes during navigation.
Rather than invest time developing and integrating a proprietary voice-over-IP solution, a third party application was selected that provides a low-bandwidth fullduplex voice communication channel between the two parties [17]. This application is started once the co-browsing session has been established. In order to simplify the prototype development, the authors tested the system using two iPAQs equipped with 802.11b cards both running the same Mobile Reality software.
Figure 6: 3D co-browsing with annotation and full-duplex audio support.
5. Future Work As Klinker et al [12] conclude, a successful indoor tracking solution will comprise multiple tracking technologies deployed in different environmental conditions to exploit best their operational characteristics. As such, further extensions to the hybrid tracking component are planned. Technologies such as location-enabled 802.11b and Ultra-Wide Band (UWB) merit deeper evaluation. Although the current implementation of the speech interaction management component uses XML to specify the speech dialogs, this specification is proprietary. Future work includes adopting an industry standard for speech interaction, such as VoiceXML [27] or SALT [25]. These two languages have intrinsic support for scripting, hence facilitating greater dynamicism. For example, in response to a speech command a script could perform a network database query and upon return filter the results before speaking them aloud. We will integrate noise reduction preprocessing technology to improve significantly upon the speech recognition accuracy in industrial environments [1].
Additional work on the collaboration support component is also required. As mentioned earlier, a variety of techniques are being explored for maintaining the synchronicity and scale of the annotation with respect to the scene as camera position changes during navigation. We wish also to replace the current third party voice communication solution with that of a standard-based SIP/RTP implementation to facilitate wider interoperability. As described above, the speech interaction is triggered when the maintenance engineer enters the vicinity of speech-enabled equipment. However, there are occasions when the engineer would prefer to remain stationary or cannot reach the equipment. Hence, we are investigating techniques for enabling the engineer to activate a proximity sensor by selecting the piece of equipment in the scene either by voice or using the stylus. It has been acknowledged that preparing and generating 3D content can be prohibitively time consuming. Some research exists that can automatically generate VRML from 2D floor plans [14], such as those in figure 2. Technology such as this has potential for lowering the barrier for technologies such as Mobile Reality to enter the marketplace.
6.
Conclusions
Described in this paper is how the Mobile Reality framework, running entirely on a Pocket PC, synchronizes a hybrid tracking solution to offer the user a seamless, location-dependent, mobile multi-modal interface. The user interface juxtaposes a three-dimensional graphical view with a context -sensitive speech dialog centered upon objects located in the immediate vicinity of the mobile user. In addition, support for collaboration enables shared VRML browsing with annotation and a full-duplex voice channel. To the knowledge of the authors, this is the first reported location-based VRML system running on a regular commercial PDA equipped with context -sensitive speech interaction. The system seamlessly fuses technologies that have until now been the exclusive domain of more powerful, heavy-duty wearable systems.
7.
Acknowledgements
The authors wish to acknowledge and thank Antonio Krueger for his participation and contributions to various discussions relating to the Mobile Reality framework. Thanks are due also to Klaus Lukas and Steffen Harengel at Siemens ICM Speech Center for their embedded speech recognition support.
8.
References 1.
2.
3. 4. 5. 6. 7.
8.
9. 10.
11. 12.
13.
14.
15. 16.
17. 18.
19.
20.
Aalburg, S., Beaugeant, C., Stan, S., Fingscheidt, T., Balan, R. and Rosca, J., Single and Two-Channel Noise Reduction for Robust Speech Recognition, to appear in ISCA Workshop Multi-Modal Dialogue in Mobile Environments, June 2002. Abowd, G., Atkeson, C., Dey, A., Hong, J., Long, S., Kooper R. and Pinkerton, M., Cyberguide: A Mobile Context-Aware Tourguide, ACM Wireless Networks Volume 3, pages 421-433, November 1997. Beigl, M., Memoclip: A Location-based Rembrance Applicance, Journal of Personal Technologies, 4(4):230-234, Springer Press, 2000. Dai, F., Virtual Reality for Industrial Applications, Springer-Verlag, 1998. Eyeled GmbH, Saarbrücken, Germany http://www.eyeled.de/ Fitzmaurice, G., Zhai, Z. and Chignell, M, Virtual Reality for Palmtop Computers, ACM Transactions on Office Information Systems, 11(3):197-218, July, 1993. Goose, S., Gruber, I., Sudarsky, S., Hampel, K., Baxter, B. and Navab, N., 3D Interaction and Visualization in the Industrial Environment, Proceedings of the 9th International Conference on Human Computer Interaction, New Orleans, USA, Volume 1, pages 31-35, August, 2001. Grasso, M., Ebert, D. and Finin, T., The Integrality of Speech in Multimodal Interfaces, ACM Transactions on Computer Human Interaction, 5(4):303-325, December 1998. Hinkley, K, Pierce, J., Sinclair, M. and Horvitz, E., Sensing Techniques for Mobile Interaction, ACM UIST, San Diego, USA, November 2000. Hull, R., Neaves, P. and Bedford-Roberts, J. Towards Situated Computing, Proceeding of IEEE First International Symposium on Wearable Computing, Cambridge, USA, pages 146-153, October 1997. Infrared Data Association: http://www.irda.org Klinker, G., Reicher, T. and Bruegge, B., Distributed User Tracking Concepts for Augmented Reality Applications, Proceedings of ISAR 2000, Munich, Germany, pages 37-44, October, 2000. Kortuem, G., Segall, Z., Thompson, T., Close Encounters: Supporting Mobile Collaboration through Interchange of User Profiles, Proceedings of the First International Symposium on Handheld and Ubiquitous Computing, Karlsruhe, Germany, pages 171-185, September 1999. Lewis, R. and Séquin, C. H., Generation of Three-Dimensional Building Models from Two-Dimensional Architectural Plans, Computer-Aided Design, 30:10, pages 765779, 1998. Luff, P. and Heath, C., Mobility in Collaboration, Proceedings of CSCW ’98, Seattle, USA, November 1998. Marmasse, N. and Schmandt, C., Location-aware Information Delivery with ComMotion, Proceedings of the Second International Symposium on Handheld and Ubiquitous Computing, Bristol, U.K., pages 157-171, September 2000. Microsoft Portrait, http://research.microsoft.com/~jiangli/portrait/ Mynatt, E., Back, M., Want, R., Baer, M. and Ellis, J., Designing Audio Aura, ACM International Conference on Computer Human Interaction, Los Angeles, USA, pages 566-573, 1998. Newman, J., Ingram, D. and Hopper, A., Augmented Reality in a Wide Area Sentient Environment, Proceedings of ISAR 2000, New York, USA, pages 77-86, October, 2001. Nilsson, J., Sokoler, T., Binder, T. and Wetcke, N., Beyond the Control Room: Mobile Devices for Spatially Distributed Interaction on Industrial Process Plants,
21. 22. 23. 24. 25. 26.
27. 28.
29. 30.
31.
Proceedings of the Second International Symposium on Handheld and Ubiquitous Computing, Bristol, U.K., pages 30-45, September 2000. Parallel Graphics, http://www.parallelgraphics.com/products/cortonace Rekimoto, J. and Ayatsuka, Y., Cybercode: Designing Augmented Reality Environments With Visual Tags, Designing Augmented Reality Environments, 2000. Ressler, S. and Wang, Q., Making VRML Accessible for People with Disabilities, ASSETS 98, Marina del Rey, USA, pages 50-55, April 1998. Saffo, P., Sensors: The Next Wave of Infortech Innovation, Institute for the Future: 1997 Ten-Year Forecast, pages 115-122. SALT Forum, http://www.saltforum.org/ Smailagic, A. and Bennington, B., Wireless and Mobile Computing in Training Maintenance and Diagnosis, IEEE Vehicular Technology Conference, Phoenix, AZ, May 1997. VoiceXML: http://www.voicexml.org Wan, D., Magic Medicine Cabinet: A Situated Portal for Consumer Healthcare, Proceedings of the First International Symposium on Handheld and Ubiquitous Computing, Karlsruhe, Germany, pages 352-355, September 1999. Want, R., Hopper, A., Falcao, V. and Gibbons, J., The Active Badge Location System, ACM Transactions on Information Systems, 10(1):91-102, 1992. Zhai, S., Milgram, P. and Drasic, D., An Evaluation of four 6 Degree-of-Freedom Input Techniques, ACM Conference on Human Factors in Computer Systems, Asmterdam, Netherlands, 1993. Zhang, X. and Navab, N., Tracking and Pose Estimation for Computer Assisted Localization in Industrial Environments, IEEE Workshop on Applications of Computer Vision, pages 241- 221, 2000.