Combining Speech and Pen Input for Effective ... - ACM Digital Library

Combining Speech and Pen Input for Effective Interaction ∗ in Mobile GeoSpatial Environments Julie Doyle [email protected]

Michela Bertolotto [email protected]

School of Computer Science and Informatics University College Dublin, Belfield Dublin 4, Ireland

ABSTRACT

a multimodal interface, combining speech and pen input, can play a vital role in simplifying human-computer interaction in mobile GIS as combining two or more different input modalities, allows users to choose which mode of interaction best suits their current task and environment. This, in effect, will lead to increased efficiency and reduced interface complexity when interacting with map-based applications. There are many advantages of using speech recognition technology in GIS, particularly field GIS where users are continually mobile. In such situations, it may be necessary for users to interact entirely via voice commands. Speech interaction is particularly valuable for mobile GIS applications, where a user’s hands and eyes may be busy with other tasks. Its ease of use coupled with the intuitiveness of using familiar voice commands means incorporating speech recognition into a mobile GIS can greatly enhance a user’s geospatial experience. While integrated use of speech and pen provides a flexible and powerfully expressive means of interaction [5], there may arise situations where a particular mode of input may not be suitable. Speech input may not work ideally in noisy, outdoor environments, for example. Or there may be certain situations where it is socially undesirable to use voice commands to interact with an application. In such circumstances, the pen can be engaged. The pen can be used to point, to select visible objects as the mouse does in a direct manipulation interface, and as a means of microphone engagement [6]. Consequently, the pen plays a fundamental role in interacting with a mobile application. Therefore it is important to design flexible and robust interfaces with which users can interact via a variety of input modalities. We are developing a mobile GIS that can be used to carry out a variety of real-world tasks by both professional and novice users. Our system allows such users to navigate, query, annotate and manipulate spatial data using pen input, speech input, or a combination of both.

Relatively little research has been conducted into designing interfaces that allow GIS users to interact effectively with geospatial data in mobile environments. Users on the move are faced with limited interaction modalities. The standard mode of input on mobile devices is the pen or stylus, which some users may find difficult or too time-consuming to use. Voice commands, combined with pen input, can provide an attractive alternative for interacting with mobile GIS, as speech is a natural form of interaction. However, the idea of combining speech and pen input in mobile GIS is relatively unexplored. To this effect, we have developed a multimodal interface to a mobile GIS, providing users with the ability to freely switch between modalities to suit current tasks or environments.

1.

INTRODUCTION

The quality of the user interface has a great bearing on the utility of a Geographic Information System (GIS) [3]. To be truly accepted and adopted, a GIS interface must be both aesthetically pleasing and natural to use. As applications have become more complex, a single modality does not permit the user to interact effectively, or naturally, across all tasks and environments [4]. This is especially true in mobile environments as limitations inherent in mobile devices, particularly limited screen space and input techniques, have direct effects on how GIS users can interact with applications on such devices. For example, a field worker in motion may find it difficult to enter a textual annotation using a pen. In such a situation, speech might be a preferred alternative as it is natural for people to speak whilst moving. Virtual keyboards are provided on mobile devices for textual input, but these take up a considerable amount of already limited screen space. Furthermore, many users find interacting via pen and virtual keyboard cumbersome. In such situations, ∗The support of the Informatics Research Initiative of Enterprise Ireland is gratefully acknowledged.

2.

OUR PROTOTYPE

We have developed a mobile GIS, CoMPASS (Combining Mobile Personalized Applications with Spatial Services), that allows users to visualise and interact with geospatial data through a visual display on a Tablet PC. CoMPASS provides mobile GIS users with highly-accurate, fine-grained and personalised spatial information [7]. When users log onto the CoMPASS system a vector-based, personalised map is returned to them based on both their current location and their previous interactions with the system. The vector

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’06 April 23-27, 2006, Dijon, France Copyright 2006 ACM 1-59593-108-2/06/0004 ...$5.00.

1182

Figure 1: Multimodal Interface on a Tablet PC data is stored in Geography Markup Language (GML) [1] file format in a remote spatial database. This GML data is then transferred over a wireless network to the mobile client where it is parsed and transformed into graphical Java objects. These Java objects are then displayed within a mapping interface, OpenMapTM [2]. Users can interact with the CoMPASS GUI to manipulate spatial data with standard GIS functionality, e.g. changing the scale of maps, zooming and panning, switching on/off map features, changing the colour of map features, annotating features on the map and viewing other users’ annotations. Spatial queries are also available through the GUI, for example highlighting features or finding the distance between two points on a map. The speech module of CoMPASS includes both speech recognition (speech to text) and speech synthesis (text-tospeech) capabilities. It is responsive, yet unobtrusive; if users wish to interact via speech they must explicitly request this by clicking the ‘Speech On’ button. An icon then appears on the interface, indicating to the user that they can now interact using speech. When the speech recognition engine is initially turned on, the system responds by informing the user that they can issue the command ‘help’ to view a list of the available commands they can use to interact with CoMPASS. All interface actions can be executed using voice commands. Allowing users to interact via speech both reduces the complexity of the system and increases efficiency: issuing simple, brief voice commands is more natural than locating the correct GUI component and using multiple pen-clicks to carry out a particular action. However, as noted above, speech may not be the best mode of input in certain situations, particularly noisy, outdoor environments. So providing the user with the option of interacting via pen/speech is a necessity.

3.

ularly in the context of mobile applications. We have described a mobile GIS that is capable of processing multimodal input by users. This multimodal input consists of a combination of speech and pen input for navigating, querying, annotating and manipulating spatial data within a mobile environment. A user evaluation, outside the scope of this paper, has been carried out the results of which demonstrate the efficiency and effectiveness of our interface for mobile GIS users. As a Tablet PC might only be available to professional users, we are currently in the process of transferring the system to a PDA, to attract a wider range of users.

4.

REFERENCES

[1] Open Geospatial Consortium Inc. (OGC). In http://www.opengeospatial.org. [2] OpenMap TM . In http://openmap.bbn.com. [3] D.P.Lanter and R. Essinger. User-centered graphical user interface design for gis. In Technical Report 91-6, National Center for Geographic Information and Analysis, California, USA, 1991. [4] J. Larson, S. Oviatt, and D. Ferro. Designing the user interface for pen and speech applications. In CHI ’99 Workshop, conference on Human Factors in Computing Systems, Philadelphia, USA, 1999. [5] S. Oviatt. User-Centered Modeling for Spoken Language and Multimodal Interfaces. IEEE Multimedia, 3(4):26–35, 1996. [6] S. Oviatt, P. Cohen, L. Wu, J. Vergo, L. Duncan, B. Suhm, J. Bers, T. Holzman, T. Winograd, J. Landay, J. Larson, and D. Ferro. Designing the user interface for multimodal speech and pen-based gesture applications: State-of-the-art systems and future research directions. Human-Computer Interaction, 15(4):263–322, 2000. [7] J. Weakliam, D. Lynch, J. Doyle, M. Bertolotto, and D. Wilson. Delivering personalized context-aware spatial information to mobile devices. In 5th International Workshop on Web and Wireless Geographical Information Systems, 2005.

CONCLUSIONS

Multimodal systems that process a combination of speech and pen input are an exciting research paradigm, partic-

1183

Combining Speech and Pen Input for Effective ... - ACM Digital Library

Combining Speech and Pen Input for Effective ... - ACM Digital Library

Suggest Documents

semi-synchronous speech and pen input

Combining bimanual manipulation and pen-based input for 3D ...

Tag Normalization and Prediction for Effective ... - ACM Digital Library

Stane: Synthesized Surfaces for Tactile Input - ACM Digital Library

Joint Interpretation of Input Speech and Pen Gestures for ... - CiteSeerX

Conditions on Input Vectors for Consensus ... - ACM Digital Library

Improving the Accuracy of Gaze Input for ... - ACM Digital Library

Innovative Tools for Sound Sketching Combining ... - ACM Digital Library

Documenting the pen-based interaction - ACM Digital Library

Combining auditory perception and visual ... - ACM Digital Library

Mixed-Initiative Dialog Management for Speech ... - ACM Digital Library

1 Is Combining Contextual and Behavioral ... - ACM Digital Library

Automatic Classification of Speech and Music ... - ACM Digital Library

24 Triggering Effective Social Support for Online ... - ACM Digital Library

lconic Programming Proves Effective for ... - ACM Digital Library

A manifesto for effective process models - ACM Digital Library

Adopting IT for Effective Management of Social ... - ACM Digital Library

Evaluating Speech, Face, Emotion and Body ... - ACM Digital Library

Effective Localized Regression for Damage ... - ACM Digital Library

Locating and Disseminating Effective Messages ... - ACM Digital Library

GRAS: An effective and efficient stemming ... - ACM Digital Library

Efficient and Effective Array Bound Checking - ACM Digital Library

Effective event discovery: using location and ... - ACM Digital Library

Reconstructing Obscured Typed Input from ... - ACM Digital Library