Technical Note PR-TN 2009/00672
Issued: 11/2009
Device-less interaction
G. Monaci; M. Triki; B.E. Sarroukh Philips Research Europe
Unclassified Koninklijke Philips Electronics N.V. 2009
PR-TN 2009/00672
Authors’ address
Unclassified
G. Monaci
HTC36-01
[email protected]
M. Triki
HTC36-02
[email protected]
B.E. Sarroukh
HTC37-21
[email protected]
© KONINKLIJKE PHILIPS ELECTRONICS NV 2009 All rights reserved. Reproduction or dissemination in whole or in part is prohibited without the prior written consent of the copyright holder .
ii
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
Title:
Device-less interaction
Author(s):
G. Monaci; M. Triki; B.E. Sarroukh
Reviewer(s):
IPS Facilities, Shan, C.
Technical Note:
PR-TN 2009/00672
Additional Numbers: Subcategory: Project:
Device-less Interaction (2007-307)
Customer:
CL - Audio and Video Multimedia
Keywords:
Device-less interaction, input technologies, presence detection, proximity detection, automatic gesture recognition, attention detection, automatic speech recognition
Abstract:
This document describes the results of a technology survey for device-less interaction. The Device-less Interaction project (2007-307) aims at providing interaction options for future home appliances without resorting to a remote control or any other dedicated control device. The target home appliances that we have in mind are audio and audio-visual devices, with an emphasis on audio use cases. Indeed, device-less interaction can create a strong differentiator for a series of AVM products in key application scenarios. This survey is essentially focused on the technologies and not on the devices that should embed such technologies. In this sense thus the survey is general and not restricted to these use cases. Still, we will also discuss in this document relevant device-less interaction use cases for BU AVM and use these use cases as examples to validate the technology landscape built here.
Conclusions:
The main output of this survey is a technology matrix summarizing the relevant characteristics of technologies for device-less interaction and the type of interactions that can be achieved using them. The introduction, in the technology matrix, of these interaction primitives allows to easily map interesting use cases to technologies. In this report it is also described how we identified two relevant use cases for AVM, “Device-less interaction at arm’s reach” and “Ubiquitous control of music”, and how, using the technology matrix, these use cases can be easily mapped to the proper technologies. To demonstrate the mapping procedure we have described in detail the whole process from scenario scripting to technology mapping and finally to prototype design and realization, for the use case Device-less interaction at arm’s reach.
Koninklijke Philips Electronics N.V. 2009
iii
Unclassified
PR-TN 2009/00672
Contents 1.
2.
Introduction .......................................................................................................................... 7 1.1.
Objective and scope .................................................................................................... 7
1.2.
Structure of the report ................................................................................................. 7
Technologies for device-less interaction .......................................................................... 8 2.1.
Presence detection ...................................................................................................... 8
2.2.
Proximity detection ...................................................................................................... 9
2.3.
Gesture control .......................................................................................................... 10
2.4.
Attention detection ..................................................................................................... 14
2.5.
Speech recognition .................................................................................................... 15
3.
Use cases selection for AVM devices ............................................................................. 16
4.
Mapping use cases to technologies ................................................................................ 18 4.1.
5.
Example: Device control at arm’s reach .................................................................... 18 4.1.1.
Scenario and technology mapping ............................................................... 18
4.1.2.
Interaction concept design and implementation ........................................... 19
Conclusions ....................................................................................................................... 22
Bibliography............................................................................................................................... 23 A
Appendices ........................................................................................................................ 24
A.1 Technology matrix ............................................................................................................. 24 A.3 Scenario scripts ................................................................................................................. 29 A.3.1
Device control at arm’s reach ................................................................................ 29
A.3.2
Ubiquitous control for music ................................................................................. 31
Koninklijke Philips Electronics N.V. 2009
v
Unclassified
1.
PR-TN 2009/00672
Introduction
1.1. Objective and scope This document describes the results of a technology survey for device-less interaction. The Device-less Interaction project (2007-307) aims at providing interaction options for future home appliances without resorting to a remote control or any other dedicated control device. The target home appliances that we have in mind are audio and audio-visual devices, with an emphasis on audio use cases. Indeed, device-less interaction can create a strong differentiator for a series of AVM products in key application scenarios, for example the ubiquitous control of audio and video playback for home audio and home cinema sound products, or the hands-free control of portable audio and video devices. This survey is essentially focused on the technologies and not on the devices that should embed such technologies. In this sense thus the survey is general and not restricted to these use cases. Still, we will discuss in the following of this document relevant device-less interaction use cases for BU AVM and use these use cases as examples to validate the technology landscape built here. This report is voluntarily concise, as it is conceived as a supporting document for the main output of the Device-less interaction project in 2009, the Technology Matrix that summarizes technologies, characteristics, examples and comments, and which is enclosed in Appendix A.1. In this report we consider the term “interaction” in a broad sense, that is, we review technologies for a broad range of possible interactions, from presence detection for the automatic activation of the device interface to 3D gesture control for navigation and content manipulation. This survey overviews the current state of technology and available commercial products and solutions. Part of the survey is based on the previous surveys and overviews, therefore some points are only briefly discussed, since a more extensive overview can be found in the related reports.
1.2. Structure of the report The main goal of the device-less Interaction project in 2009 was to compile a technology grid summarizing available technologies enabling device-less interaction and their characteristics. In Section 2 we review the available technologies enabling device-less interaction. In Section 3 we describe the use cases that were judged to be more relevant for BU AVM and that were selected to focus the attention of the project. In Section 4 we illustrate the process of mapping the use cases of interest to technologies using the technology matrix. To do that we take as an example one of the selected use cases, Device control at arm’s reach. Section 5 concludes this report discussing the results achieved in the 2009 Device-less Interaction project.
Koninklijke Philips Electronics N.V. 2009
7
PR-TN 2009/00672
2.
Unclassified
Technologies for device-less interaction
We have decided to organize this survey according to five macro categories of interactions with devices. These categories are arbitrary but reasonably general, and we believe that they cover exhaustively the range of possible device-less interactions with smart home appliances. The following interaction categories are covered in this Section: Presence detection Proximity detection Gesture control Attention detection Speech recognition
2.1. Presence detection Presence detection is an essential feature in many fields and applications and it is used typically for people detection to activate (or stop) lights, machineries, automatic doors etc. Many technologies exist for presence detection, and all of them are mature and used in products that are in the market since years. Sensors used for presence detection include: pyroelectric infrared sensor (PIR), ultrasound and microwave radar, capacitive sensor, video camera, active infrared sensor, laser scanner, light barrier and audio detection. PIR sensors are the most widely used presence detection sensors, and they are commonly used in security lighting, burglar alarm systems etc. A PIR sensor is a motion detector which detects changes of heat (far infrared) emitted naturally by humans and animals. PIR sensors are cheap and robust, but not accurate, as they cannot detect a stationary or very slowly moving bodys. Pyroelectric sensors are temperature sensitive: they work optimally at ambient air temperatures of around 15-20 degrees. An interesting review of the Parallax PIR sensor, one of the most common ones and available for less than $10, can be found at http://www.scaryterry.com/itw/pirsensor/pirsensor.htm. Ultrasound (US) and microwave (MW) radars are active devices that work based on the radar principle. They operate on the same principle, but the waves emitted by the two sensors have a different wavelength, which implies different characteristics. MW sensors have longer operation range (up to about 30m) than US (about 10m) and can travel through walls and partition materials. US waves can be better confined and US detectors give in general more accurate detection results. However US radars are disturbed by air flow and temperature changes. An exhaustive report of Philips Research compares these two technologies for presence detection (1); the interested reader is referred to it for further details. Capacitive sensors sense the perturbation of the electrostatic field produced by nearby objects. Capacitive sensors are low-range device that are widely used in industrial applications for distances typically around 5-10cm. Sensors detecting the change of capacitance at distances of about 1m exist, but in this mode they need to be carefully calibrated. Common capacitive sensors are extremely accurate (at mm level) and they can detect targets through other materials. Capacitive sensors are more expensive than US and MW sensors, although in principle they could be cheaper as they are simpler to build, they are passive (less power consuming) and smaller. Excellent documentation on industrial sensors including capacitive, US, MW sensors is publicly available in documents published by sensors producers, e.g. (2) (3) . Active infrared (IR) presence sensors work by emitting an invisible pulsed IR light signal. The receiver looks for the reflected signal back and reacts to changes in the reflection of the signal that indicates a presence in its detection area. Active IR sensors have an operation range of about 5m. They are mainly used as presence and motion detectors, alone or in combination with US or MW motion detectors, for automatic door systems like the Optex OA-203C (http://www.optex.co.jp/as/eng/industrial/oa203c.html). 8
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
Still based on the principle of reflected light, there exist laser scanner presence detectors. These are complex devices that scan the space in 2D and 3D to detect the presence of persons in areas with complex shapes. They are used for robot navigation and to monitor hazardous areas. These sensors are extremely accurate (they are used to build precise maps of the sensed area) and expensive (http://www.sick.com/group/EN/home/products/). Cameras are widely used for surveillance and presence detection. Given the amount of information that cameras can provide, camera-based detection systems are often enriched with more advanced functionalities, for example identity recognition (4). A very widely used presence detection system is based on beam breaking. Beam breaking sensors are made of two separated parts, an emitter and a receiver placed far apart. A laser or light beam connects the two elements, and when the beam is interrupted presence is detected. These systems are very accurate and robust, which makes them particularly appealing for several industrial applications (http://www.directindustry.com/industrialmanufacturer/light-barrier-73340.html). These sensors have the clear disadvantage of requiring two elements placed apart. A recent tendency in this area is to integrate different modalities to compensate the flaws of single technologies alone and improve the detection performances. For example, Siemens produces a motion detector, the UP370T system, that integrates PIR and ultrasound sensors, as well as a patent on a presence detection system using PIR sensor with a video camera (5). Presence can also be detected using audio methods. For example, several baby monitors are activated if the acoustic level detected in the baby’s room is higher than a certain threshold. Devices can be also explicitly activated by specific sounds that are recognized by an intelligent engine, for example clapping hands or whistling.
2.2. Proximity detection Most of the technologies described in the previous section are also used to measure distances. Technologies used include in fact US and MW radar, capacitive sensor, video camera, active infrared sensor, laser scanner, stereo camera and audio reverberation. The most widely used proximity sensors are based on US and MW radar technology, and they have been popularized by the diffusion of car parking sensors. Capacitive sensors are sometimes used for range detection, but they might not be accurate because a more distant "strong" target can give the same response as a nearby "weak" target. A capacitive sensing lamp control was recently demonstrated at Philips Applied Technologies Labs, where the distance between the lamp and the user’s hand determines the light intensity. Ordinary video cameras have also been used to estimate the distance of a target, but the estimate is typically not robust. To provide camera-based systems with more accurate and robust distance estimation capabilities, several mechanisms have been introduced in recent years. One well-established method is based on stereo cameras. Stereo camera systems are made of two cameras that estimate the distance of objects in the field of view by measuring the disparity between the two captured images. Stereo cameras are being overruled by simpler systems based on computational imaging, because stereo cameras require strong constraints to hold (constant illumination conditions, certain distance between cameras, occlusions, texture or edges required), they are expensive and large, and finally depth estimation is computationally quite expensive. We will describe in more details computational imaging methods for depth estimation in next paragraph, since they are mainly used for gesture recognition. Distance can also be estimated by detecting the reverberation time of audio signals. Philips holds several patents on this topic (6), but to the best of our knowledge this mechanism is still not used in any product.
Koninklijke Philips Electronics N.V. 2009
9
PR-TN 2009/00672
Unclassified
2.3. Gesture control In the quest of natural and instinctive ways of interacting with devices, gesture control interfaces have received much attention recently. This interest is witnessed by the excitement produced by the recent presentation by Microsoft of Project Natal (http://www.xbox.com/enUS/live/projectnatal/). The goal of gesture control systems is to capture and recognise human gestures and translate these into control commands. Gesture recognition thus requires more detailed information than just hand distance or position. Technologies used for gesture control include ultrasound sensors array, capacitive sensor, laser, video camera, video camera plus depth (or time-of-flight camera), camera with active IR illumination, stereo cameras, beam breaking, sound classification. There exists no really mature technology for gesture control; however a number of research prototypes, road-show demos and startups are appearing with increasing frequency. We expect that gesture control technologies will become more and more popular and readily available in the near future.
Figure 1. Array of ultrasound sensors from EllipticLabs. The ultrasound sensors are placed on the corners of the triangle. The device can accurately track the finger in front of it. 1
Ultrasound sensor arrays are made of a set of simple ultrasound sensors (at least 3), each estimating the distance of a nearby object (Figure 1). This data is triangulated to estimate the 3D position of the target in front of the sensor. Ultrasound arrays have a limited interaction area and are not accurate enough for fine hand gestures, but they have a long operating range (10-20m) and are obviously robust to varying lighting conditions and acoustic noise. They also have some major drawbacks, notably they require accurate calibration as they are sensitive to room geometry. Furthermore sensors can cross-talk and they have to be placed far apart and in known geometry for triangulation.
Figure 2. GestIC: Capacitive sensing gesture interface form Ident Technology. 1
10
Regular microphones can also be used as ultrasound sensors.
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
Capacitive sensors for gesture recognition are typically used to control a screen as they can be embodied in the device as a frame. Figure 2 shows a capacitive-based gesture interface from Ident Technology which has this shape. A similar prototype has been built in recent years at Philips Applied Technologies Labs to browse pictures with gestures in medical applications (see Figure Figure 3). The sensor detects the perturbation of the electromagnetic field caused by a nearby target and estimates its 3D position (7). Capacitive sensors are suitable for 2D pointing and other simple gestures at small distance (less than 1m) and are robust and cheap to build. However they require accurate calibration and they have a limited interaction area and they interfere with electronic devices.
Figure 3. Philips prototype of gesture-based navigation of medical images. Few solutions exist based on laser reflection detection. One is a hybrid system produced by Celluon that projects a laser virtual keyboard and detects the finger position by capturing the reflected laser and IR light (Figure 4 [left]). A system based on the active tracking of laser reflected by fingers is shown in Figure 4 [right] and described in (8). The system is extremely fast and accurate, but it is still very complex and developed only at an academic level.
Figure 4. Laser based gesture detection: Celluon virtual keyboard system [left] and laser-based active finger tracking [right]. Most of the solutions proposed for gesture control are video based. An exhaustive review of video based gesture control technologies can be found in (9). In this report we will overview the principal video based technologies and highlight their advantages and disadvantages. The first video-based gesture control systems to enter the market used a normal camera and computer vision algorithms to control different types of applications. In 2003 Sony launched the pioneering EyeToy for its PlayStation 2. The EyeToy is a camera that uses computer vision Koninklijke Philips Electronics N.V. 2009
11
PR-TN 2009/00672
Unclassified
algorithms to segment players’ silhouettes from the video and interpret movements and gestures. This solution, although innovative, had some drawbacks which limited its diffusion. For example, people segmentation accuracy is greatly influenced by illumination conditions, distance, occlusions. The EyeToy only supports simple commands, as body gesture recognition is extremely challenging on convential videos. A gesture control system based on a normal camera and computer vision has been also introduced by Toshiba in its high-end Qosmio laptops (Figure 5). The system is extremely advanced and makes full use of the quad-core Qosmio processor. From user tests and reviews the gesture control appears reasonably robust and fluid, especially in its newer 2009 version. There exist also gesture control systems that use the embedded camera of mobile phones (http://www.eyesight-tech.com/). These systems use basic computer vision algorithms and exploit the proximity of the user to the camera to control games and other simple applications. The information provided by normal cameras is not always sufficient to implement gesture control systems because of the sensibility to illumination changes, distance of operation, motion, cluttered background. To alleviate these issues and retain the richness of information available with video-based approaches, in recent years vision systems have been enriched with mechanisms providing additional information and robustness. One hardware component that, combined with ad-hoc computer vision software, is gaining popularity for gesture control systems is the time-offlight (TOF) camera. TOF cameras capture an ordinary RGB image and in addition create a distance map of the scene based on the time-of-flight principle. There are several implementations of TOF cameras, but in general the depth map is built using the LIDAR principle: modulated light is emitted by LEDs or laser and the depth is estimated by measuring the delay between emitted and reflected light. The main producers of TOF cameras are Panasonic, Canesta, 3DV Systems, MESA Imaging, PMDTechnologies, IEE. TOF cameras are still very expensive as they require dedicated, accurate sensors: the MESA SwissRanger for example, can be bought from the Acroname website for $9095 (http://www.acroname.com/robotics/parts/R330SR4000-10M-ETH.html). However prices are likely to decrease as TOF cameras are rapidly gaining popularity because of their simplicity and compactness. In 2008 3DV repeatedly announced the launch of a TOF-webcam for about $100. However the commercialization never took place, also because the company was acquired by Microsoft, which will allegedly use similar technologies for its Project Natal. Recently, several companies have shown gesture control prototypes based on TOF cameras. At CES2009 for example, Hitachi had a gesture interface featuring a Canesta TOF camera (Figure 5 [right]).
Figure 5. Video-based gesture interfaces. On the left the Toshiba Qosmio which uses a standard camera for gesture recognition. On the right the Hitachi system presented at CES 2009 which features a TOF Canesta camera. A slightly less expensive and accurate option to obtain 3D information from video is to project structured IR light patterns on the scene and to retrieve depth information from the way structured light interferes with objects in the scene (10). This technique is particularly interesting because it requires only standard hardware and thus allows a cheaper implementation than TOF 12
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
cameras. According to the press release, this type of mechanism is the one that will be employed by Microsoft Project Natal dedicated hardware (Figure 6): “Depth sensor: An infrared projector combined with a monochrome CMOS sensor allows “Project Natal” to see the room in 3-D” (11).
Figure 6. Microsoft Project Natal: example of possible gesture interface [left] and detail of the dedicated sensor [right]. Despite their decrease of popularity, stereo cameras are also used to capture 3D gestures, as in the iPoint Explorer and Presenter prototypes from the Fraunhofer Heinrich Hertz Institute or in the LM3LABS Ubiq’window solution for signage (http://www.ubiqwindow.jp/english). In more constrained scenarios where the interaction space is confined into a certain region, a widely adopted solution to interpret gestures and pointing commands relies on the beam breaking principle. These systems provide a fixed interaction area which is created using IR light and one or more IR cameras placed on different positions detect where the IR beams are interrupted by the hand. GestureTek, one of the leaders in gesture interaction, has a whole series of such product, called GestPoint. An example of one of their products, the AirPoint is shown in Figure 7 [left]. This type of solution is also foreseen for the automotive industry, where the interaction is clearly confined in the space next to the driver. For example, Georgia Tech researchers in collaboration with Chrysler designed the Gesture Panel, a system that uses a camera aimed at a grid of infrared LEDs, and gestures are made between the camera and the grid (Figure 7 [right]).
Figure 7. GestureTek AirPoint system [left] and schematic representation of the Georgia Tech Gesture Panel for vehicles. In a broad sense, gestures are not only hand movements but also facial expressions. There is a huge literature on facial expression recognition, including some research effort currently ongoing at Philips Research. To the best of our knowledge, only few robust systems exist on the market; one of the most reliable existing products is the SeeingMachines FaceAPI software (http://www.seeingmachines.com/product/faceapi/), which is capable of robustly tracking faces and extract facial features in real time (Figure 8). Koninklijke Philips Electronics N.V. 2009
13
PR-TN 2009/00672
Unclassified
Figure 8. FaceAPI: face tracking and facial expression recognition software by SeeingMachines. As far as gestures are concerned, we do not tend to think about audio. However it was recently proposed the Scratch Input, an acoustic-based input technique that relies on the unique sound produced when a fingernail is dragged over the surface of a textured material, such as wood, fabric, or wall paint (12). A simple audio sensor is coupled with existing surfaces, such as walls and tables, turning them into large, unpowered and ad hoc finger input surfaces. Different scratch gestures are then classified using machine learning techniques and associated with ad-hoc commands.
2.4. Attention detection One very natural way of activating a device would simply be to look at it. Detecting if a user is attending a device can be done with different levels of accuracy. Detecting where someone is 2 looking is typically done using a contact-less gaze tracker. A gaze tracker or eye tracker measures eye positions and eye movement by sensing with a camera, or some other specially designed optical sensors, the light, typically infrared, that is reflected from the eye. The information is then analyzed to extract eye rotation from changes in reflections. There are several vendors of eye tracking systems (e.g. Tobii, SeingMachines etc.) and several variations on the same idea. Professional gaze tracking systems typically use stereo cameras with dedicated lenses and require calibration to provide accurate gaze position estimation. These systems are the most robust and accurate, but also the most expensive. One example of such system is depicted in Figure 9 [left].
Figure 9. On the left the SeeingMachines FaceLab: a gaze and head tracking system. On the right the gaze detection system Xuuk eyebox2. 2
There exist several eye tracker systems, typically used in clinical studies, that use an attachment to the eye (e.g. a special contact lens) or electrodes placed around the eyes to sense eye movements. These solutions are clearly out of the scope of this report.
14
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
There are also research prototypes of gaze tracking platforms that use active IR illumination and only one camera (13), and even systems completely based on computer vision (14). These systems might require calibration as well and they are much less robust and mature than commercial ones, but they might potentially be less expensive. If only head pose orientation is required, computer vision solutions are available, for example from SeeingMachines (Figure 8). We would like to underline here that very advanced computer vision solutions for head pose estimation and gaze tracking are being developed also within Philips Research, in the Video Processing group. An alternative to gaze tracking that recently appeared on the market is gaze detection. A gaze detection device does not track where a person is gazing, but only checks whether someone is looking at it or not. One such a device is the eyebox2 produced by the stat-up Xuuk (https://www.xuuk.com/) and it is shown in Figure 9. The eyebox2 has an IR camera and several IR LEDs illuminating the scene. When the eyeballs are aimed in its direction, they reflect light back to the camera, which detects the reflection and registers the fact that someone is looking at it from up to 10 meters, and without requiring calibration. The system looks promising, although still expensive (as of today, $5000).
2.5. Speech recognition One natural way of giving commands is using speech. Since several years well established technologies provide all sorts of devices with automatic speech recognition capabilities. Speech recognition systems are typically made of three components: a capturing device, a preprocessing unit and a speech recognition engine. A complete overview of speech recognition systems can be found in (15). Here we want to underline a few facts about speech recognition architectures. Concerning the capturing device, three classes can be defined: headset solutions, handheld solutions and microphone-arrays. In the device-less interaction context, only this last configuration is of interest, as the other two involve a device to be held by the user. In a hands-free interaction context, because of the distance between user and device, the signal pre-processing (e.g. de-noising and de-reverberation) is of paramount importance. In the past years a number of world class pre-processing solutions have been proposed by Philips Research, and they showed to be essential for the system robustness (15). Speech recognition engines come in different flavours. There are several available solutions on the market which have reached an unprecedented level of robustness and maturity. The main player in the speech recognition field is Nuance (www.nuance.com/), which also supplies Philips with its recognition engines.
Koninklijke Philips Electronics N.V. 2009
15
PR-TN 2009/00672
3.
Unclassified
Use cases selection for AVM devices
One of the aims of the Device-less Interaction project was to define use cases relevant for the AVM business in which device-less interaction is the favourite option. In this context, on 12 February we organized a workshop to generate and discuss use case scenarios where there is a need for device-less interaction. To the workshop participated interaction experts from Philips Research: Peter Bingley, Bernt Meerbeek, Gianluca Monaci, Mahdi Triki, Dzmitry Aliakseyeu, Jia Du, Elke Daemen and Eddine Sarroukh. The workshop consisted of two parts. In the first part, the participants had a list of AVM devices and they had to say which of these devices they use more often. For each device, a list of tasks and interactions that are currently performed were described. During the second part of the workshop, the participants were asked to think of the devices, functionalities and situations in which they would prefer to have device-less interaction, and write that down in an interaction scenario, focusing on interactions between user, context, and devices. During the workshop were generated 13 use cases, which are collected in Appendix 0. In general, participants found device-less interaction desirable in a number of scenarios, in particular because hands are busy or dirty, because the user is distant from remote control, when multiple users can control a device and for safety reasons. The project members, after accurate evaluation and discussion with Andreas Keller (CL Architecture), Kuldeep Kulshreshtha (AVM Singapore), Bart Mantels (Innovation Lab Leuven) and Maurice McGinley (Design), decided to select 2 use cases to focus the attention of the project: 1. Device control at arm’s reach; 2. Ubiquitous control of music. These two use cases were considered the more relevant for AVM products and also those for which device-less interaction is more useful and desirable.
Figure 10. Device control at arm’s reach
Device control at arm’s reach Alice is in the dining room setting up the table for a dinner with friends. She wants to add some music to create a nice atmosphere. So she walks over to the audio system. The device detects her presence and displays a list of music choices. While looking at the choices, Alice simply uses her hands to scroll through the music and compiles a playlist for the evening (Figure 10). For this use case, the motivations for device-less interaction are: Don’t have to fetch the RC first Interact in a comfortable position Sleek button-less design Dirty hands scenario 16
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
Device control at arm’s reach use case has the following characteristics Types of control: Scrolling through lists (e.g. on local display), Pressing (virtual) buttons. Feedback Mechanism: Local graphics display for content browser or virtual display. Operating distance: 0 – 1m. Mobility: Fixed position.
Figure 11. Ubiquitous control of music.
Ubiquitous Control of Music Emily is in the living room playing with her 2 year old daughter. The music is playing in the background. Emily wants to change the music and select another artist, while keeping playing with her daughter. When the new song starts, she wants to reduce the volume without getting up to fetch a remote control (Figure 11). For this use case, the motivations for device-less interaction are: Easy-to-use text entry solution Instantaneous control without getting the RC first The ubiquitous control of music use case has the following characteristics Types of control: Limited set of commands, selection, text entry. Feedback Mechanism: Audio, light, projection, video. Operating distance: 0 – 10m. Mobility: Flexible position anywhere in the room.
Koninklijke Philips Electronics N.V. 2009
17
PR-TN 2009/00672
4.
Unclassified
Mapping use cases to technologies
We use the Technology Matrix to map relevant use cases to interaction concepts. This mapping shows the utility and generality of the technology survey and proposes a methodology to support the implementation of a given use case using the appropriate technologies. The mapping follows three main steps: 1. Scenario scripting: o Decompose the use case into elementary interaction primitives, for example “approach the device”, “activate the device”, “navigate in 2D” etc. 2. Map interaction primitives to technologies: o The mapping is straightforward, since the proposed Technology Matrix includes the technical characteristics of the scouted technologies and the interaction primitives that can be implemented using them. 3. Design interaction concepts: o Once the most promising technologies are selected bases on the mapping, the interaction concept can be fine-tuned based on the technical characteristics required for the considered use case. A schematic flowchart of the mapping process is depicted in Figure 12.
Figure 12. Mapping use cases to technologies using the Technology Matrix is a three step process: (1) decompose the use case into interaction primitives, (2) map the interaction primitives to technologies and (3) fine-tune and design the interaction concept. To clarify how the mapping process from use case to interaction demo can be realized, in the next section we show how the use case Device control at arm’s reach has been designed and realized using the Technology Matrix. The scenario scripts for Device control at arm’s reach and Ubiquitous Control of Music use cases can be found in Appendix A.3.
4.1. Example: Device control at arm’s reach 4.1.1. Scenario and technology mapping Alice is in the dining room setting up the table for a dinner with friends. She wants to add some music to create a nice atmosphere. So she walks over to her audio system. The device detects her presence and displays a list of music choices. While looking at the choices, Alice simply uses her hands to scroll through the music and compiles a playlist for the evening Players: Alice (A), Smart Audio System (SAS) Color legend: Alice’s actions – System’s actions – Possible technologies used [Alice] approaches her Smart Audio System [Smart Audio System] detects someone’s presence and goes on active mode (implicit detection) Capacitive sensing, Passive Infrared, Ultrasound radar, Video (normal and IR), Beam breaking 18
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
[SAS] feedbacks that it is active Implicit, lighting up a LED etc. [A] notifies that she wants to interact Start gesture Capacitive array, Ultrasound array, Video (normal and IR), Beam breaking Approaching hands/feet to a particular device’s part Capacitive, Ultrasound, Video (normal and IR) Looking at the device’s display Gaze Producing a start sound (e.g. clapping) Sound classification Speech (“Start!”) Speech recognition [SAS] detects that user wants to interact and goes on full operating mode (explicit detection) [SAS] feedbacks that it is in interaction mode E.g. showing in the local display the “start page” [A] switches to the Favorites dimension 2D Motion Capacitive array, Ultrasound array, Video, Beam breaking [SAS] shows the songs in the Favorites space [A] navigates through her albums 2D Motion Capacitive array, Ultrasound array, Video, Beam breaking [SAS] shows navigation in the Favorites space [A] selects the songs for the playlist Virtual button pressing Capacitive, Ultrasound, Video, Beam breaking [SAS] selected songs fly in the playlist [A] notifies that she is done Virtual end button pressing Capacitive, Ultrasound, Video, Beam breaking End gesture Capacitive array, Ultrasound array, Video (normal and IR), Beam breaking Approaching hands/feet to a particular device’s part Capacitive, Ultrasound, Video (normal and IR) Producing an end sound (e.g. whistling) Sound classification Speech (“Done!”) Speech recognition [SAS] goes in sleep mode E.g. only a LED on [A] goes back to prepare the table [SAS] automatically turns off after few moments Capacitive sensing, Passive Infrared, Ultrasound radar, Video (normal and IR), Beam breaking
4.1.2. Interaction concept design and implementation The most promising technologies for this use case result to be capacitive sensor, ultrasound array and video. Capacitive sensors are frame-shaped and detect the perturbation of the electromagnetic field caused by a nearby target and estimates its 3D position. Ultrasound sensors arrays are arrays of simple sensors (at least 3) that use the radar principle to triangulate the position of an object in a predefine interaction area. After discussion with colleagues who have worked with ultrasound array technologies, we understood that the setting and calibration of such systems are extremely delicate and it is required every time the system is displaced. In our setting, this appears a fundamental limit for this technology. Similarly, capacitive sensors arrays require accurate calibration and also capacitive sensors interfere with electronic equipment. Based on our technology survey and on these discussions, video stood as the most readily Koninklijke Philips Electronics N.V. 2009
19
PR-TN 2009/00672
Unclassified
available and easier to use technology, as it consists of just one device that needs essentially no complex calibration. Furthermore, video has the advantage of being extremely flexible and providing a wealth of information at different scales. We decided thus to demonstrate device-less interaction at arm’s reach using a video-based solution. The idea that we implemented is to create a virtual interaction space next to the device where hand movements and gestures are captured and interpreted as commands for any type of device. Two pictures of the final prototype are shown in Figure 13. The device comprises a camera with a wide angle lens without IR coating and a filter to block light in the visible spectrum. An array of LEDs emitting non-visible IR light is controlled by a programmable Arduino Diecimila board and illuminates with IR light the space in front of the camera. The camera captures then only nearby objects (e.g. the hand) that are illuminated with IR light (see Figure 14).
Figure 13. Prototype implemented to demonstrate device-less interaction at arm’s reach. An “Arduino Diecimila” board controls the IR LEDs, which are normally covered by a diffuser (picture up right). The wide angle lens captures only nearby objects illuminated by the LEDs.
Figure 14. Frame captured by the IR camera. The picture also show the optical flow calculated at this frame. Flow vectors inconsistent with the other vectors are discarded, and they are drawn here in red. 20
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
The optical flow on the segmented hand is used to control iTunes and browse a music collection. To compute the optical flow, first “good features to track” are extracted using the Shi-Tomasi feature detector implemented in openCV (16). These features are then tracked from frame to frame using the Lucas-Kanade optical flow solver implemented in openCV (16). The average horizontal component of the optical flow (where inconsistent flow vectors are discarded, as shown in Figure 14) triggers left and right movements. A peak in the average image intensity, which occurs when the hand approaches the camera as in a virtual button press, releases a mouse click to select songs. Snapshots of the demo are shown in Figure 15.
Figure 15. Browsing a music collection on iTunes using hand gestures.
Koninklijke Philips Electronics N.V. 2009
21
PR-TN 2009/00672
5.
Unclassified
Conclusions
Device-less interaction can provide a strong differentiator for BU AVM in several relevant use case scenarios. In this note we have reported the results of a survey of technologies enabling device-less interaction. The main output of this survey is a technology matrix summarizing the relevant characteristics of the overviewed technologies and the type of interactions that can be achieved using them. The introduction in the technology matrix of these interaction primitives allows to easily map interesting use cases to technologies. In this report it is also described how we identified two relevant use cases for AVM, “Device-less interaction at arm’s reach” and “Ubiquitous control of music”, and how, using the technology matrix, these use cases can be easily mapped to the proper technologies. To demonstrate the mapping procedure we have described in detail the whole process from scenario scripting to technology mapping and finally to prototype design and realization, for the use case Device-less interaction at arm’s reach.
22
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
Bibliography 1. P.T.M. van Zeijl, H.M.J. Boots, M. Klee, B. Kumar, J. Mills, W.F. Pasveer, H. van der Zanden. A Comparison of Ultrasound and Radar for Presence Detection. s.l. : Philips Research Europe, 2008. PR-TN 2008/00306. 2. Thomas A. Kinney, Brian Duval. Sensors 101: Baumer electric basics. s.l. : Baumer Ltd. 3. EandM. EandM Siemens Self-Study Courses. [Online] http://www.enm.com/EandM/training/siemenscourses/snrs_1.pdf, http://www.enm.com/EandM/training/siemenscourses/snrs_2.pdf, http://www.enm.com/EandM/training/siemenscourses/snrs_3.pdf, http://www.enm.com/EandM/training/siemenscourses/snrs_4.pdf. 4. Viion Systems Inc. [Online] 2009. http://www.viionsystems.com/indexEN.htm. 5. Hansjürg Mahler, Martin Rechsteiner, Rolf Abrach. Presence detector and its application. US6486778 2002. 6. Cornelis P. Janse, Corrado Boscarino, Rene M. M. Derkx. Audio Signal Dereverberation. EP1774517 2005. 7. Thracker - Using Capacitive Sensing for Gesture Recognition. R. Wimmer, P. Holleis, M. Kranz,A. Schmidt. 2006. ICDCS Workshops. 8. Smart Laser-Scanner for 3D Human-Machine Interface. A. Cassinelli, S. Perrin and M. Ishikawa. 2005. International Conference on Human Factors in Computing Systems. 9. Shan, Caifeng. Vision-based Gesture Control: A Review. s.l. : Philips Research Eindhoven, 2008. TN-2008-00405. 10. Capturing 2 1/2 D depth and texture of time-varying scenes using structured infrared light. C. Frueh, A. Zakhor. 2005. CVPR. 11. Microsoft. Project Natal 101. [Online] 2009. http://download.microsoft.com/download/A/4/A/A4A457B3-DF5D-4BF2-AD4E963454BA0BCC/ProjectNatalFactSheetMay09.zip. 12. Scratch Input: Creating Large, Inexpensive, Unpowered and Mobile finger Input Surfaces. C. Harrison, S. E. Hudson. New York, NY : ACM, 2008. ACM Symposium on User interface Software and Technology. pp. 205-208. 13. Remote eye tracking: State of the art and directions for future development. Martin Böhme, André Meyer, Thomas Martinetz, Erhardt Barth. 2006. onference on Communication by Gaze Interaction (COGAIN). pp. 10-15. 14. Single image face orientation and gaze detection. Jeremy Yrmeyahu Kaminski, , Dotan Knaan, Adi Shavit. 1, 2009, Machine Vision and Applications, Vol. 21, pp. 85-98. 15. Aliakseyeu, D.V., et al. Speech & Gesture control for IPTV services - Technology scouting. s.l. : Philips Research Europe, 2008. PR-TN 2008/00393. 16. G. Bradski, A. Kaehler. Learning OpenCV: Computer Vision with the OpenCV Library. s.l. : O'Reilly, 2008.
Koninklijke Philips Electronics N.V. 2009
23
PR-TN 2009/00672
Unclassified
A Appendices A.1
24
Technology matrix
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
A zoomed-in version of the first columns and rows of the Technology Matrix is shown below.
Koninklijke Philips Electronics N.V. 2009
25
PR-TN 2009/00672
A.2
Unclassified
Workshop: Use cases for device-less interaction
On 12 February 2009 a workshop was held to generate use case scenarios where there is a need for device-less interaction. The focus of the workshop was on use cases involving AVM devices, such as sound systems, home theatre systems, portable audio and video players. The workshop was attended by the following interaction experts from Philips Research: Peter Bingley, Bernt Meerbeek, Gianluca Monaci, Mahdi Triki, Dzmitry Aliakseyeu, Jia Du, Elke Daemen, Eddine Sarroukh. The output of the workshop was a set of 13 use cases that are characterized based on the user situation and on the environmental characteristics. The 13 use cases are listed below. 1. Taking a bath Lydia is taking a bath, relaxing, listening to the radio. Now, she would like to switch to a CD and adjust the volume. She would like to do that without leaving the bathtub not to get cold and make her bathroom wet. Besides it is dangerous to operate the device with wet and soapy hands and to go around the bathroom (slippery floor). It would be just fine if she could use her voice to control the device (gestures seem inappropriate here, hands are soapy and wet). User: Still, Alone, AVM device out of arm’s reach Environment: Indoor, Quasi static, Low AV noise, Low illumination 2. Pictures with friends Martin is showing his holiday pictures and videos on his new TV in his living-room. It would be great if him and all his friends could interact and play with the content all together (zooming, grasping, navigating etc.). User: Mostly still, Multiple users, AVM device out of arm’s reach / far Environment: Indoor, Quasi static, Some AV noise, Variable illumination conditions 3. Hanna’s kitchen Hanna is watching a cooking program and preparing the meal at the same time in her kitchen. The program is recorded on the TV hard disk, so Hanna can “adjust” the timing to match her preparation time. For example, pause when she needs more time to prepare her recipe, go back if she missed an ingredient or did not understand the cooking technique, or go fast forward to go to next steps or skip commercials. However now she cannot really do all that because her hands are dirty with food and she is handling cooking tools. Furthermore, it is not hygienic to cook and use the remote at the same time. The remote is also far apart because there is no space on the table. User: Moving, Alone, AVM device out of arm’s reach Environment: Indoor, Quasi static, Some AV noise, High illumination 4. Thomas on the go Thomas uses his portable MP4 player in various and changing environments (walking, in the train –stopping and moving -, biking). As the audio SNR is continuously changing, it would be great if the volume could automatically adapt to the noise conditions. The same type of automatic volume adjustment would be really nice to account for the audio level variation between songs (some songs are much louder than others because of encoding). User: Moving, Alone, AVM device on the user Environment: Outdoor/ indoor, Rapidly changing, Strong AV noise, Variable illumination conditions 5. Lazy Sunday on the couch It is Sunday morning 11:00 and Bart is recovering from an evening out with his friends. He is in a very lazy mood and wants to relax. He jumps into the couch and decides not to leave it until 26
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
dinner. Bart wants to watch the TV show he missed last night. After watching the show, he looks for something else on TV but he cannot find anything he likes. He decides then to check his emails. His mother asked whether he would come by today. Bart decides to send a sms to his mom saying that he is too busy today. After sending the sms, he puts on some lounge music and completes the atmosphere with lights. He closes his eyes and relaxes. His couch turns into relaxation mode and starts massaging Bart. User: Still, Alone, AVM devices out of arm’s reach / far Environment: Indoor, Quasi static, Low AV noise, Variable illumination conditions 6. Pool party Jia and Karl are having a swimming pool party in their garden. Several friends came over and they are having really a good time swimming and talking and watching/listening music videos. The device is far away, they are wet and everybody is engaged in a nice conversation. However the music now is over and they would like to switch to a different type of content. It would be really nice if they could switch to a different DVD without touching the player (wet) and without interfering with the nice atmosphere. User: Moving, Multiple users, AVM device out of arm’s reach / far Environment: Outdoor, Quasi static, Some AV noise, Variable illumination conditions 7. Sharon and her baby Sharon needs to breast feed her baby. She takes the baby and sits down on the couch to start feeding. Then she notices that the curtains are still open. The baby is crying now because he is hungry. So she cannot use her voice to close the curtains, but luckily a simple hand gesture suffices. While feeding it becomes dark outside so she tells the lights to switch on in a cosy atmosphere. A few minutes later a commercial block starts on TV, so Sharon lowers the volume with a voice command. When the baby is satisfied he falls asleep in Sharon’s arms. Sharon tells the TV to switch off, instructs the phone to go into “busy” (silent) mode, and tells the music system to gently put on some easing background music. Sharon gets a bit chilly from being inactive, so she instructs the heating system to increase the room temperature. User: Still, Alone (interaction), AVM devices out of arm’s reach / far Environment: Indoor, Quasi static, Some AV noise, Variable illumination conditions 8. Dinner Time Music in the background provides a nice eating experience. Therefore I switch to a music channel just before dinner time. So far so good, but sometimes commercials or unwanted programs disturb this experience. It gets that annoying that I need to switch to another channel. I need ways to control my TV or audio system from my chair without the need of putting the remote control within my arm’s reach. User: Still, Alone, AVM devices out of arm’s reach / far Environment: Indoor, Quasi static, Some AV noise, Variable illumination conditions 9. Setting the timer Imagine you are warmed under the blanket listening to music helping you to fall asleep, but you forgot to set the timer. You want to set the timer without leaving the bed and even without taking your hands from under the blanket. User: Still, Alone, AVM devices close / out of arm’s reach / far Environment: Indoor, Static, No AV noise, Low illumination 10. Lisa’s atelier Lisa is in her atelier and working on a new oil painting. Her boyfriend calls her painting style somewhat chaotic. He doesn’t like to clean up the mess that Lisa leaves behind. There is paint Koninklijke Philips Electronics N.V. 2009
27
PR-TN 2009/00672
Unclassified
everywhere, on the floor, on the light switches, and on the radio. Therefore, Lisa’s boyfriend bought her a radio that can be controlled without touching the device. She can now switch to a more aggressive song if she is not satisfied with her work or to a very quiet song when she needs to work concentrated. She can also put the volume down when her boyfriend is shouting to her from downstairs. User: Moving, Alone, AVM devices close / out of arm’s reach Environment: Indoor, Quasi static, Some AV noise, Variable illumination conditions 11. Camping We are in the camping on the seaside. We have our portable media player and we are watching a nice movie after a day on the seaside. Of course, we are drinking a beer and relaxing before taking a shower. Also our neighbours turned on their radio so Jia wants to increase the volume but she drinks her beer and the camping is not the best place to operate the device, because of dirt, not on a stable position,… User: Still, Multiple users, AVM device out of arm’s reach / far Environment: Outdoor, Quasi static, Some AV noise, Variable illumination conditions 12. Phone call I often listen to music while I’m cooking my dinner, but when I get a phone call it is disturbing to first look for the remote control to mute the audio system and then grab the phone. In this situation I need a simple quick way to control my audio system and the phone at the same time! User: Moving, Alone, AVM devices out of arm’s reach Environment: Indoor, Quasi static, Some AV noise, High illumination 13. Painting Nails Lydia listens to her personal music while painting her nails. She remembers that her favourite radio program would start in few minutes. Due to the painting it is inconvenient to look for the remote control under the blanket or to go to the docking station to switch to her favourite channel with the paint on the nails. It would be just fine if she could use few hand movements and / or voice to switch to her favourite channel! User: Still, Alone, AVM devices out of arm’s reach / far Environment: Indoor, Quasi static, Low AV noise, Variable illumination conditions
28
Koninklijke Philips Electronics N.V. 2009
Unclassified
A.3
PR-TN 2009/00672
Scenario scripts
A.3.1 Device control at arm’s reach Players: Alice (A), Smart Audio System (SAS) Alice is in the dining room setting up the table for a dinner with friends. She wants to add some music to create a nice atmosphere. So she walks over to her audio system. The device detects her presence and displays a list of music choices. While looking at the choices, Alice simply uses her hands to scroll through the music and compiles a playlist for the evening Color legend: Alice’s actions – System’s actions – Possible technologies used [Alice] approaches her Smart Audio System [Smart Audio System] detects someone’s presence and goes on active mode (implicit detection) Capacitive, Passive Infrared, Ultrasound radar, Video (normal and IR), Beam breaking [SAS] feedbacks that it is active Implicit, lighting up a LED etc. [A] notifies that she wants to interact Start gesture Capacitive array, Ultrasound array, Video, Beam breaking Approaching hands/feet to a particular device’s part Capacitive, Ultrasound, Video Looking at the device’s display Gaze Producing a start sound (e.g. clapping) Sound classification Speech (“Start!”) Speech recognition [SAS] detects that user wants to interact and goes on full operating mode (explicit detection) [SAS] feedbacks that it is in interaction mode E.g. showing in the local display the “start page” [A] switches to the Favorites dimension 2D Motion Capacitive array, Ultrasound array, Video, Beam breaking [SAS] shows the songs in the Favorites space [A] navigates through her albums 2D Motion Capacitive array, Ultrasound array, Video, Beam breaking [SAS] shows navigation in the Favorites space [A] selects the songs for the playlist Virtual button pressing Capacitive, Ultrasound, Video, Beam breaking [SAS] selected songs fly in the playlist [A] notifies that she is done Virtual end button pressing Capacitive, Ultrasound, Video, Beam breaking End gesture Capacitive array, Ultrasound array, Video, Beam breaking Approaching hands/feet to a particular device’s part Capacitive, Ultrasound, Video (normal and IR) Producing an end sound (e.g. whistling) Sound classification Speech (“Done!”) Speech recognition [SAS] goes in sleep mode E.g. only a LED on [A] goes back to prepare the table [SAS] automatically turns off after few moments Capacitive sensing, Passive Infrared Ultrasound radar, Video, Beam breaking
Koninklijke Philips Electronics N.V. 2009
29
PR-TN 2009/00672
Unclassified
The most promising technologies for this use case result to be capacitive sensor, ultrasound array and video. Capacitive sensors are frame-shaped and detect the perturbation of the electromagnetic field caused by a nearby target and estimates its 3D position. Ultrasound sensors arrays are arrays of simple sensors (at least 3) that use the radar principle to triangulate the position of an object in a predefine interaction area. After discussion with colleagues who have worked with ultrasound array technologies, we understood that the setting and calibration of such systems are extremely delicate and it is required every time the system is displaced. In our setting, this appears a fundamental limit for this technology. Similarly, capacitive sensors arrays require accurate calibration and also capacitive sensors interfere with electronic equipment. Based on our technology survey and on these discussions, video stood as the most readily available and easier to use technology, as it consists of just one device that needs essentially no complex calibration. Furthermore, video has the advantage of being extremely flexible and providing a wealth of information at different scales. We decided thus to demonstrate device-less interaction at arm’s reach using a video-based solution.
Video Pros
Cons
Cheap and compact
Not robust to light/appearance changes
Flexible and scalable
Less accurate than C and U
A lot of information available
Capacitive Pros
Cons
Robust to light, temperature, air flow variations
Low operating range (max 0.5-1m)
Very accurate (more than video and US)
More expensive than video and US
Can detect target through other material
Limited information
No cross talk in arrays
In arrays has limited interaction area (10s cm)
Can distinguish mass
In arrays requires sensors to be placed far apart and in known geometry for triangulation
Accurate sensing field shaping
Interferes with electronic devices
Ultrasound Pros
Cons
Robust to light variations
Moderately sensitive to temperature and air flow variations
Very large operating range (up to 20m)
Limited information
Accurate (less than C, but more than video)
Crass talk in (basic) arrays
Cheaper than C, more expensive video
In arrays has limited interaction area (10s cm) In arrays requires sensors to be placed far apart and in known geometry for triangulation Sensitive to room geometry (reverberation)
30
Koninklijke Philips Electronics N.V. 2009
Unclassified
PR-TN 2009/00672
A.3.2 Ubiquitous control for music Players: Emily (E), Smart Audio System (SAS) Emily is in the living room playing with her 2 year old daughter. The music is playing in the background. Emily wants to change the music and select another artist, while keeping playing with her daughter. When the new song starts, she wants to reduce the volume without getting up to fetch a remote control. Color legend: Emily’s actions – System’s actions – Possible technologies used [Emily] Starts the interaction with her Smart Audio System [Smart Audio System] detects that user wants to interact and goes on full operating mode. [SAS] feedbacks that it is in interaction mode E.g. showing in the local display the “start page” Gaze, Gesture (video) Audio scene analysis Speech recognition [E] Enters the artist name. [SAS] feedbacks the entered artist name. Speech recognition [E] Validates the search query. [SAS] shows the search results in the Favorites space. Implicit Gesture (video) Audio scene analysis Speech recognition [E] Navigates through the search results and selects her preferred song. Browse and select (up/down, left/right) Gesture Identify and select (album/song name) Speech recognition [SAS] shows navigation in the Favorites space. [SAS] selected song starts playing. [E] Reduces the volume. [SAS] lower the volume. Gesture Audio scene analysis Speech recognition [E] Goes back to play with her daughter. [SAS] automatically turns off the local display after few moments Implicit Gesture (video) Audio scene analysis Speech recognition Most promising technology: Speech. Speech is basically the only technology that allows to input commands ubiquitously and which is advanced and robust enough.
Koninklijke Philips Electronics N.V. 2009
31