Natural DVI based on intuitive hand gestures - CiteSeerX

4 downloads 46296 Views 2MB Size Report
Phone +43/732/2468–1432, Email {lastname}@pervasive.jku.at. Abstract. .... and grouping functions allow to organize emails according to sender, topic, etc. .... Navarathna, R., Dean, D.B., Lucey, P.J., Sridharan, S.: Audio visual automatic.
Natural DVI based on intuitive hand gestures Andreas Riener and Michael Rossbory and Alois Ferscha Institute for Pervasive Computing, JKU Linz/Austria Phone +43/732/2468–1432, Email {lastname}@pervasive.jku.at

Abstract. Drivers’ operating a car today are increasingly challenged and/or distracted by the multiplicity of information in and around the vehicle. As a possible solution, we introduce the approach of intuitive driver-vehicle operation based on natural (hand) gestures. The goal of this promising type of interface is, to allow drivers to interact with the vehicle in “pleasure”, hence in the background and without adding cognitive load to the otherwise demanding tasks. To show its potential we discuss in this position paper a systematic, step-oriented procedure for its application to control an email system while driving. Keywords. Gestural interaction, depth image recognition, driver-vehicle interface, natural vehicle control.

1

Overstrained vehicle operation

Recent developments in the field of (pervasive) information and communication technology have found their way into the vehicular domain, providing the driver with real time information (car status, POIs, traffic conditions), connecting him/her (via mobile phone or Internet browser in the dashboard) to the rest of the world, and making driving safer by employing safety systems like ACC or LDW. On the other side, however, the driver is more and more challenged to cope with information provided concurrently, and most the time without a clear distinction of the level of importance of a certain message. The result (which is, of course, not new) is mental overload and distraction; public authorities of many countries have recognized this exposure, and have passed bills against it, for example in prohibiting cell phone use while driving. With broad availability of wireless Internet connections at high bandwidth together with applications and data stored in “the cloud” on the one hand, and high resolution (multi-touch) displays commonly integrated into center console or dashboard on the other, a new challenge is “born” – as now the way is cleared for even integrating the work place (office software, email client) into the car. (It should be clear that the use of tools such as for book keeping (i. e., a spreadsheet program) while driving would be almost impossible, however, the use of a word processor or an email client, in particular if offering the ability to express input verbally and/or to provide spoken output, would make sense.) Current solutions: Unsatisfactory The increase in both quantity and complexity of in-vehicle operation tasks is a important factor contributing to driver

distraction. Therefore, car manufacturers are increasingly demanded in providing easily understandable, intuitive operable interfaces to reduce workload. As driving is mostly a visual task (confirmed, e. g., “the most dominant source of danger in vehicles is not looking in the appropriate direction [..]” [1]), the visual sense should be used rarely for output (reading information from displays), but also for providing input to the car (e. g., by looking onto a display or control element while turning a knob or operating a touch-sensitive display). Voice control using microphones placed in the passenger compartment is one possible solution to address this problem [2]. Its usability was shown by BMW in the “spoken dialog system” (SDS) prototype, a voice control extension to the iDrive [3]. Unfortunately, the robustness and effectiveness of in-car voice recognition systems is, in general, still poor [4] (audio channel is corrupted by engine and environmental noise, cell phone use, passenger conversations). Camera tracking of lip movements is immune to these acoustic environmental factors, and is therefore used together with voice recognition technology to further improve recognition rate. But also this combined system poses problems, as the driver has to look into the camera for being recognized, and, more important, other conversations, either with mobile phone or passengers, are almost impossible when such a system is in use. A combination of knob and button based input, arranged/placed readily accessible, is still the preferred method for taking input from the user. Mercedes, for instance, uses in its control system many knobs and buttons on a flat menu structure, whereas BMW and Audi rely on fewer multi functional knobs (“iDrive”, “MMI”) in conjunction with quite complex menu navigation on a additional display. Recent research is no longer relying on knobs/buttons, but is trying to use gestures as a less distracting method [5]. Unfortunately, common to all the existing solutions (overview of state-of-the art [6], [7]) is, that they are using (CCD) cameras for tracking the gestures with its well known limitations. Research Approach: In order to provide a solution for intuitive, non-distracting user input while operating a car, and at the same time considering the above identified problems (visual, auditory, gestural input), we propose to gather hand gestures from the passenger compartment, more specifically from the gearshift area, by using depth images to overcome the problem of standard CCD cameras. – Hypothesis (H.i ) Hand gestures recorded with a RGBD camera can be used for natural (intuitive, non-distracting) driver-vehicle interaction in real time. As a first attempt we are employing a system based on the depth camera integrated in the Microsoft Kinect. To show the potential of natural hand gesture control, we have chosen realistic gestures, mapped them to functions of an email client, and tested the system in real car. To further avoid visual distraction, feedback from the email client was given auditory, i. e., by reading out emails.

2

Case study: Email client operation while driving

Reading and processing emails is a standard, well understood work task taking more and more time daily to deal with. Therefore, it is optimally suited as test

case, as it is highly supposed that such a system, if installed, would find broad user acceptance, at least if intuitively operable and not distracting from driving. Step I– Feature identification and reduction: Current email clients have dozens to hundreds of functions, whereof only a subset can be recalled by the driver. Emails are by default organized chronically, with most recent messages on top; they can be merged into groups like “yesterday”, “this week” or “last month”. Moreover, email clients provide user defined sub folders for structured archiving and grouping functions allow to organize emails according to sender, topic, etc. The way of email organization according to user favor has to be concerned also for systems employed in the car (i. e., navigation through time/group based email hierarchies). Having selected a certain email, the message itself consists of two parts, the header (sender, list of recipients, subject, additional information such as routing, protocols) and the content. For a self-contained system it should also be possible to “jump” between these individual parts. Beside the browsing functionality, interaction must also be rendered possible for high-level functions such as (de-)activating the client, pause and resume reading, delete emails or mark it for further processing by providing corresponding gestures. TAXONOMY HAND EMAIL (FORM) GESTURE FUNCTION

EMAIL DETAILED FUNCTION DESCRIPTION

static pose

A

1

I. General - activate/deactivate Email system

static pose and path

B

static pose and path

C

2, 3 4, 5 6, 7 8, 9

II. Time navigation - next/prev. Email - hour +/- day +/- week +/-

10? 11, 12

III. Content navigation - pattern search (regex) - folder based navigation +/- (research/teaching/...)

13, 14 15 16 17, 18 19 20 21

IV. Individual Email - jump through fields +/- (sender, rec., subject, content) - pause/resume - stop reading - fast forward/rewind - delete Email - repeat Email - mark Email (highlight for further processing)

static pose

D

static pose and path

E

one-point touch

F

Fig. 1. Favored functions of a in-car email client and subset selected. Common to all gestures is a metaphorical nature, world-independent binding, and continuous flow [8].

To confine the number of functions to be implemented to a optimum subset, a small user survey was conducted with salesmen traveling a lot and having confirmed frequent email use. By asking 1) habits in using email clients today, and 2) vision how they would like to use it in the car while driving, thoughts,

wishes, but also criticism against such a system were gathered. Result of the opinion survey are around 20 tasks, identified as core functionality for future incar use while driving (Fig. 1). (Some desired functions have been left out because of infeasibility for gestural interaction).

Step II– Gesture definition and mapping: The agreement for gestures is one of the most important things in UI design in order to keep cognitive workload low and to avoid distraction due to recall or inappropriate mapping. In doing so, gestures need to be chosen application specific, and mapped to certain activities. For example, waving the hand from left to right would be intuitively understood as “next email” and turning the thumb up could be mapped to “increase the reading speed” (see Fig. 2, 3). The definition of a gesture set in the coarse of this

Gesture A „system on/off“

1 System activation (binary)

Gesture D „pause/resume“

2 Pause read out

Gesture F „mark email“

3 Mark for further processing

Fig. 2. System (de-)activation, pause/resume reading, mark email read (left to right).

work is motivated by the groundwork of [8]. According to the taxonomy presented there, our gestures are of metaphorical nature, world-independent binding, and continuous flow. With respect to the form we are employing static poses w/o path as well as one-point touch poses (Fig. 1). To further circumvent definition of unintuitive gestures, their selection and corresponding mapping to activities was again supported by a survey (features were presented and participants were asked which hand pose or gesture they would intuitively connect to a certain function). The most notable findings are that most participants proposed to use kind of “wiping gesture” for inter-mail navigation, using either the whole hand with fingers outstretched or the pointing finger only. All test persons used a “move the hand away from the body” gesture to browse into the past, and in the opposite direction to browse toward the current date/time. For intra-mail navigation, wiping/waving gestures to the left/right with pointing finger, thumb, or all fingers outstretched have been recommended (Fig. 3). Deletion of emails was interpreted differently, either as a gesture “throwing something over the shoulder”, “wiping the entire hand away from their body” (indicating to trash something), or “squeezing a sheet of paper”; for marking emails, participants came up with the idea to point up-/downward with the trigger finger or thumb to increase/decrease the priority of an email, others suggested to use this gesture for increasing/decreasing the reading speed. Keeping the hand still in front, with all fingers closed (as like instructing a person to stop), was the gesture recommended for pause reading.

Gesture B „next email“

Gesture „next email“ (sequence of five snapshots)

Fig. 3. Gesture “next email” (shown is a sequence of five snapshots). Was changed in the final setting to not conflict with the one-point touch gesture “mark” (Fig. 1).

3

Conclusions

The underlying software framework for (i) definition of gestures (trajectory recording), (ii) mapping to in-car functions, and (iii) gesture recognition is fully developed. Following the analysis of the survey conducted, 6 gestures (A–F, Fig. 1) are currently implemented in the prototype, and already deployed in a pilot study. First results have shown that drivers like this natural type of interface. It follows that in-vehicle function control with intuitive hand poses and gestures seems to be a promising approach, which can easily be extended from email to other fields of application. Now, that we have the tools at hand to control the entire application life cycle, and we are aware of both potentials as well as system limitations, we will start a systematic investigation of (other) in-car services and devices controllable with hand gestures in a natural, intuitive way.

References 1. Verwey, W.B., Alm, H., Groger, J.A., et al.: GIDS Functions (chapter 7). In: Generic Intelligent Driver Support – A Comprehensive Report on GIDS. Taylor and Francis, London, UK (Sept. 1993) 113–146 ISBN: 0-74840-069-9. 2. McCallum, M.C., Campbell, J.L., Richman, J.B., Brown, J.L., Wiese, E.: Speech recognition and in-vehicle telematics devices: Potential reductions in driver distraction. International Journal of Speech Technology 7 (2004) 25–33 3. Hassel, L., Hagen, E.: Evaluation of a dialogue system in an automotive environment. In: In 6th SIGdial Workshop on Discourse and Dialogue. (Sept. 2005) 155–165 4. Navarathna, R., Dean, D.B., Lucey, P.J., Sridharan, S.: Audio visual automatic speech recognition in vehicles. In: AutoCRC2010 Conference, Melbourne. (2010) 5. Geiger, M., Zobl, M., Bengler, K.: Intermodal differences in distraction effects while controlling automotive user interfaces. In: Proceedings HCII, New Orleans. (2001) 6. Pickering, C.: The search for a safer driver interface: a review of gesture recognition human machine interface. IEE Computing and Control Engineering (2005) 34–40 7. Chaudhary, A., Raheja, J.L., Das, K., Raheja, S.: A survey on hand gesture recognition in context of soft computing. In: Advanced Computing. Volume 133 of CCIS. Springer Berlin, Heidelberg (2011) 46–55 8. Wobbrock, J.O., Morris, M.R., Wilson, A.D.: User-defined gestures for surface computing. In: Proceedings of the 27th Int. conference on Human factors in computing systems. CHI ’09, New York, NY, USA, ACM (2009) 1083–1092

Suggest Documents