Direct Interaction with Large-Scale Display Systems using Laser Pointers
Kelvin Cheng
[email protected] November 2002
Bachelor of Computer Science & Technology (BCST) Honours Thesis
Supervisors Kevin Pulo and Professor Peter Eades
School of Information Technologies Sydney, Australia
Abstract Existing large scale display systems generally adopt an indirect approach to user interaction. An alternative to the standard desktop-oriented devices, such as the computer mouse to control large wall-sized displays, can be considered in improving user-efficiency. By using an infrared laser pointer and an infrared tracking device, a more direct interaction with the large display can be achieved, thereby reducing the level of cognition required for usage and further providing user mobility. The principal concept behind the proposed system is the application of hotspots, regions surrounding objects of interest, and gestures, movements made with the laser pointer that trigger an action similar to those used in modern web browsers. Furthermore, these concepts are demonstrated by an add-in module for Microsoft PowerPoint using the NaturalPoint™ Smart-Nav™ tracking device.
2
Acknowledgements
I would like to sincerely thank Kevin Pulo, my supervisor, for his tremendous amount of time and effort dedicated into the making of this project, continuously throughout the year, for it would not have been successful otherwise. I would also like to thank Peter Eades, also my supervisor, for providing great ideas and wisdom for this thesis. To Masahiro Takatsuka, thank you for the many useful discussions and encouragements. Thanks to Judith Dawes from the Department of Physics at Macquarie University, for her invaluable technical expertise with lasers. And finally to hons, especially G66, for providing the entertainment and making this year so much more enjoyable.
3
Contents 1 Introduction
5
2 Background
8
3 Concepts
14
3.1 Model-View-Controller Paradigm
. . . . . . . . . . . . . . . . . . . . 14
3.2 Display Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Selection
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Interaction Paradigm
21
4.1 Challenges Addressed 4.2 Proposed Solution
. . . . . . . . . . . . . . . . . . . . . . . . . . 21
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Implementation
23
26
5.1 System Overview
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Highlight
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Gestures
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.4 Optimisation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5 An Application – PowerPoint add-in . . . . . . . . . . . . . . . . . . . . 35 5.6 User Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.7 Optimisation 5.8 Distortion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.9 Unresolved technical difficulties and limitations . . . . . . . . . . . . . 44
6 Evaluation
45
7 Conclusion and Future Work
49
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7.2 Potential future uses of this system
. . . . . . . . . . . . . . . . . . . . 49
7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4
Chapter 1
Introduction
Large scale display systems spanning an entire wall are widely used in many modern information technology facilities, especially for non-interactive purposes such as presentations. Where they are used interactively, the user interaction devices typically consist of a standard keyboard and mouse. However, there are a number of reasons why these devices are less than optimal for large displays. From the outset, in 1968, Douglas Engelbart developed the mouse to provide a way for users to interact with personal computers [1]. It was never designed to be used in a large display environment. As a result, the mouse only performs moderately well when scaled to large screens. Imagine a user giving a talk to an audience while their presentation slides are being shown on a large screen behind them. To use the mouse, the user needs to place it on a desk or a flat surface. This constrains the user to stay within arm’s reach of the table, thus reducing their mobility. To interact with the system, the user moves the mouse in a horizontal plane across the table, while the cursor moves in a vertical plane of the display. There is a need for users to adjust for small movements of the mouse with a corresponding larger movement on the screen. Thus, time is required for the user to think what action is needed in order for the cursor to reach the desired position on the display. Their cognitive load is therefore increased, distracting the user away from their task. Yet another problem is the need to localise the pointer on the large display requiring visual attention. Thus the mouse is not optimal for interacting with large displays.
5
A better approach is a system that allows for direct interaction between the user and the objects seen on the large display, for example, manual manoeuvres to point at the display or to rotate objects by twisting, pushing or turning of the hands. In general, we need a device that allows the user to interact directly with the display without the need for an intermediary device. Such systems are more natural and easier to use.
The aim of this thesis is to explore ways of interacting with large displays using a more direct approach. The method chosen is an infrared laser pointer and associated tracking device where its use in the area of large-scale display systems will be examined. We have provided solutions to several of the challenges that emerge in such systems, specifically latency and selection. A demonstration of these techniques is implemented in the context of presentation software.
An in-depth exploration of various input devices and their suitability for such interaction is presented in the next chapter. Chapter 3 provides an overview and the basis of this thesis in terms of the MVC model and the concepts used. The interaction paradigms for direct interaction with large display are then discussed in Chapter 4. The details of our LasIRPoint implementation will be given in Chapter 5. Finally, Chapter 6 presents an evaluation of this system.
6
Figure 1.1: A schematic representation of existing large-display interactive system and a photograph of a user using such system.
Figure 1.2: An example of direct interaction with large-display and a photograph of a user using such system.
7
Chapter 2
Background Manipulation is the adjustment or changes made to an object in some shape or form. In terms of information technology, it goes as far back as the 1960s where Seymour Papert at the Massachusetts Institute of Technology developed the LOGO computer programming language. The concept of symbolic computation [2] was adopted to give a visual representation to words and ideas. A graphical representation, a turtle, is used to allow children to type simple commands and observe for the effects of the turtle on screen. This is an example of indirect manipulation where instructions are entered into the computer using the keyboard in order to manipulate the turtle. Alternatively, a more direct approach is to control the turtle by interacting with its on screen representation. This is the principle behind direct manipulation, a term coined by Ben Shneiderman in 1983 [3]. MacDraw uses this approach to allow users to manipulate lines and shapes on a drawing [4,5] by providing users with a tool palette along the left side of the drawing. Users can then select different tools using their mouse to manipulate the drawing as desired. The manipulation is achieved by using the mouse to interact with the system. Interaction, as distinct from manipulation, refers to the method which users provide input to the system, and the system provides feedback to the user. The interaction technique used in this project is the pointing device. The traditional keyboard is sufficient for inputting characters or numbers, such as that used by children to enter simple commands in LOGO. However, convenience and efficiency can be achieved if
8
one can point or aim directly at a location we are interested in. Such a system would be more direct and easier to use. Over the years, various novel input devices have been developed to address the problem of indirectness, with varying degrees of success – a comprehensive overview is outlined in [6]. We now consider some devices that provide a somewhat more direct approach than the mouse.
The lightpen is one of the first pointing devices produced [7]. It works by pressing the pen against a CRT display which is operated by a switch on the pen and allows it to capture light produced from the screen. However it is not suitable for prolonged use due to its poor ergonomics, thus it was eventually replaced by the mouse. As discussed in Chapter 1, the mouse provides an indirect method to interact with the computer. Figure 2.1: Light pen Light
gun
technology
has
been
used
extensively, especially in the computer gaming industry. It also provides a direct approach to interacting with CRTs and unlike light pens it can operate at a distance. Although the accuracy is maintained, its major drawback is that it must be used with a CRT display. Thus, large-scale displays cannot be used since Figure 2.2: Light gun
images are provided by a data projector.
The Polhemus FasTrak and Logitech 3D Mouse belong to a category of industrial strength tracking systems primarily designed for 3D motion tracking, as found in applications such as CavePainting [8].
9
FasTrak is an electromagnetic tracking system that computes the position and orientation of a tiny receiver as it moves through space. The major problem however is its vulnerability to electromagnetic interference and radiation particularly from the monitor. In addition, this system has a limited range of 3 metres and a
Figure 2.3: Polhemus FasTrak
latency of 4 milliseconds. It is primarily designed for 3D motion capturing in a Virtual Reality environment. The 3D Mouse is another similar tracking system. It uses a stationary triangular transmitter which emits ultrasonic signals to track the movement of a smaller triangular receiver. This resolves the problem of interference from radiation but introduces interference by other equipments that use
Figure 2.4: Logitech 3D Mouse
ultrasonic signals. The system also has a limited range of 2 metres and a high latency of 30 milliseconds. These are typically used for CAD object manipulation and Virtual Reality which are cumbersome and expensive, costing up
to
US$6000,
and
thus
considered
inappropriate for use in our system. Figure 2.5: Logitech Spaceball The Logitech Spaceball, Gyration Gyromouse and Interlink RemotePoint represent another category of input devices designed for personal use as a mouse replacement. The Spaceball is a device with a ball-shaped controller mounted on top. It allows users to push, pull and twist the ball in order to manipulate on-screen objects. It is designed
10
Figure 2.6: Gyration Gyromouse
for 3D model manipulation and provides a more natural movement for the user. The Gyromouse is based on a technology called GyroPoint that uses gyroscopes to detect angular movements of the device. These rotations can be used to control a 3D object or mouse cursor. The RemotePoint allow users to roll their thumb around a soft rubber pointing button fitted onto a handheld device. The advantages over the previous set include their
Figure 2.7: Interlink RemotePoint
ability to be wireless, more affordable and natural, although the user still interacts through an intermediary device.
Another category of input devices tracks the position of the head of the user and moves the mouse cursor on the display correspondingly.
Figure 2.8: Origin HeadMouse
These are primarily developed to provide full mouse control to people who cannot use their hands but have good head control. Devices in this category consist of the Synapse Head Tracking
Device,
Origin
Instruments
HeadMouse and NaturalPoint Smart-Nav.
Eye tracking devices follow the movement of the pupils and can in principle be used for cursor
control
[9].
This
technology
Figure 2.9: Natural Point Smart-Nav
is
beneficial within a confined, controlled space but there is currently no evidence to support its use on a large display. Another promising input system is voice-recognition and although recognition accuracy is improving, these systems are primarily command-based [6],
11
Figure 2.10: Eye tracking device
thus indirect.
Touch screens provide an excellent solution to the problem of direct interaction and have high precision [10] but again, they do not scale well to large displays as manufacturing of such large touch display is not feasible (even
with
technology
such
as
Figure 2.11: Smartskin on a Touch screen
DiamondTouch [11] or Smartskin [12]). Even if it were possible, the user cannot reach the top of the display nor pace across from side to side. The same can be said with MimioMouse [13] which uses a pen like device to allow the user to press against the wall at a specific location on the display.
Recent inventions such as the handheld
Figure 2.12: Mimio Mouse
Personal Digital Assistant (PDA) and graphics tablets use a stylus device, such as a specialised pen, to perform input. Such devices are only useful for personal use, where they allow direct interaction. However they are still indirect when used with a large display because the user performs input on
Figure 2.13: Wacom tablet
the handheld device and receives feedback on the large display on the wall.
One device that is attracting an increasing amount of research is the laser pointer. These have the advantages of mobility, direct interaction, and being comparably inexpensive with the notable disadvantages of lag and instability with the human hand. Figure 2.14: Compaq PDA
12
Many studies into these systems have been carried out. Dwelling is a popular technique for interaction [14], and Olsen investigated the effect of lag with this method [15], which led to a discussion on the use of visible and invisible laser pointers [16]. The problem of hand jitter was presented in [17, Figure 2.15: Laser Pointer
18].
13
Chapter 3
Concepts This chapter provides an understanding for the definition of each element involved in the systems we examined. Since these terms can have different meanings to people in different domains, here we present how we have defined them in the scope of this study. Most of these concepts are defined in terms of the Model-View-Controller (MVC) paradigm [19].
3.1 Model-View-Controller Paradigm MVC decouples an interactive user interface into three parts. The Model is the application data to be presented; the View is a representation of the Model displayed on-screen; the Controller is the user who is interacting with the system. Figure 3.1 is a diagram portraying the MVC paradigm.
(a) (d) Model
Controller
View (c)
(b)
Figure 3.1: The MVC paradigm.
14
This paradigm is generally used in Object-Oriented design for software systems. In this project, we are applying it to an interactive human-computer system. The model in our system is the data being display on-screen. The view corresponds to the large display and the controller is the user and input device. In transition (a), the user modifies the application data within the model. In transition (b) the view is updated to reflect these changes. In transition (c) the user observes the representations on the view and in transition (d) the user uses an input device to interact with the view. This thesis focuses on the View, Controller, and transitions (c) and (d), which together we call display systems.
3.2 Display Systems Any sensory output device is a display, these act as the View in MVC. They may be as common as a monitor, or as specialised as 3D sound (where a sound in a specific position in space can be accurately determined using 3D spatialization [20]), haptic devices (provide forced haptic feedback on the device that the user is holding[21]), or a CAVE (where two or more large screens are joined together to make a room-like screen). In our system, we will only deal with monitor type displays in bounded 2D vertical planes. We are only interested in displays that are larger than the size of a person and we call these large-scale displays. Such displays have the problem of reachability – they are too tall for the user and too wide, out of arm’s length. Therefore, some kind of pointing device or pointer is needed. These are user input devices (hardware) which can be used to locate a particular point in space (used by the Controller to interact with the View, figure 3.1d). Such devices include the computer mouse, laser pointer, Gyropoint, Headtracker and 3D tracker, all of which were discussed in Chapter 2. Two types of inaccuracies exist in such systems. The first is inaccuracy in time, or latency. This is the time difference between when the user moves the pointing device (time as perceived by the user), and when the display updates to reflect this change, if any update is necessary (time as perceived by the system). In terms of the MVC, it is the time it takes to complete transitions (c) and (d).
15
The second type is inaccuracy in space resulting from pointer instability. The unsteadiness of our hands affects the exact location of any pointing device on the screen. Different devices have different amounts of instability, for example a mouse typically has low instability, whereas a laser pointer has high instability, which increases as the distance to the display increases. Interaction refers to the cycle between (c), the user observing the output display, and (d), using the input device to change the display, and the response of the system to these changes. There are two broad categories of interactions, direct and indirect.
The category an interaction belongs to depends on the amount of physical disparity between the pointing device space and the output space, and the mapping between these spaces. For a touch screen, the input space is the same as the output space, giving very direct interaction. However for a mouse, the input space is the flat horizontal surface of a table and the output space is the vertical surface of a monitor. We can see that these two spaces do not overlap and there is a mapping between them, giving an indirect interaction.
Apart from this physical disparity, the level of cognition can also affect the level of directness. The level of cognition used depends on the amount of thinking required by the user when using the input device. For the previous examples, with a touch screen, the user directly point to the location where they want to interact with on the screen. However with a mouse, the user must first think how it may be moved so that the cursor on screen moves to the desired location. The fact that users require some level of thinking means that the input device is partially indirect. Although this higher level of cognition can be reduced with practise, it is always present.
The level of directness is also influenced by whether the input space can be linearly transformed into the output space as shown by the following formula:
f(xinput) = c xoutput (where c is a constant).
16
This is true for both the laser pointer and the touch screen since both input and output space is the same, therefore they can be linearly transformed. However, this is not the case with a mouse, f touchscreen (x,y) = (x,y) f laser (x,y) = (x,y) f mouse (x,y,z) = (x,z) (where the z direction is perpendicular to the display). If the transformation between the input and output space is linear, a lower level of cognition is required and giving a higher level of directness.
3.3 Selection An object is a graphical representation of a piece of data in the Model as displayed on the View. It can be in the form of characters or pictures with lines and shapes drawn. It has no meaning without the underlying data. It is quite different to an object in an Object-Oriented software system. The bounding box of an object is the smallest rectangle which fully encloses the object.
A selection is a set of objects of interest that can be used for manipulation. For example, selecting files to copy in Microsoft Windows Explorer, or moving a set of shapes to a different location in a graphics application. Manipulations are applied to all objects in the selection. The system must provide user-interface operations for adding and removing objects from the selection. These operations are called the selection process.
The selection process is the action users need to accomplish in order to include an object in the selection or remove it from the selection. During this process, an object must first be a selection candidate before it can be selected. The selection candidate is the object indicated by the input device as a potential for adding to or removing from the selection, although it may or may not be of interest to the user. The selection process also includes how an object is indicated as a selection candidate.
17
Some examples of possible selection processes include point and click, circling, dwell clicking and the use of arrow keys on the keyboard.
Point and Click – The process of using a pointing device, such as a mouse, to aim at an object of interest. The user then selects the object by clicking a button on the input device. This is the most commonly used method since all personal computers now come with a mouse as standard.
Circling – By using natural movement of the human body, such as the hands, we can use our natural movement to make a selection. For example, gesturing a circle around an object of interest. Implementing such action requires additional computation compared to Point and Click. However, it is an alternative when a clicking mechanism is not present.
Dwell Clicking- After pointing to an object of interest (a selection candidate is chosen), the pointing device stays on that object. After some time delay, the object will be selected. This is yet another alternative to Point and Click. The major drawback of this method is that additional time is consumed when selecting. It also requires steady hands from the user which sometimes cannot be achieved.
Arrow keys – Allows the user to use the arrow keys to change the selection candidate. An example in the Microsoft Windows environment is when an object is selected, the “control” key is pressed while pressing the arrow keys. A greyed outline indicating the selection candidate will move around the object (figure 3.2b).
Selection cues are visual techniques used to show the user which objects are selected or are selection candidates. Possible techniques include foreground highlighting, background highlighting and outline highlighting and animation.
Foreground highlighting – When the appearance of the foreground of an object changes. For example, a folder icon in Microsoft Windows Explorer changes colour when selected (figure 3.2). We can see that the yellow folder icon in (a) has a blue tint added to it in (c). 18
(a)
(b)
(c)
Figure 3.2: (a) A folder icon in Microsoft Windows Explorer prior to selection. (b) The same folder icon as a selection candidate. (c) The same folder icon after selection.
Background highlighting – The appearance of the background of an object changes appearance. Using the same folder icon example in figure 3.2, we can see that the string “New Folder” has a blue background after selection.
Outline highlighting – This technique only highlights the outline of an object rather than its background (figure 3.2b). This has the advantage that the highlighting will not greatly affect the appearance of the object, and thus it is well suited to selection candidates. In figure 3.2(b) we can see the text “New Folder” has a dotted outline around it.
Animation – Rather than altering the appearance of the object, the entire object may change, or morph into different shapes or other objects. A wide variety of animation techniques can be used. A common example is a link in a navigation bar on a website that provides a dropdown menu when hovered over (figure 3.3).
(a)
(b)
Figure 3.3 Selection cue using animation. (a) A link before “All Products” is selected. (b) Drop down menu appearing after selection.
19
(a)
(b)
(c)
Figure 3.4 (a) A textbox before any highlighting. (b) A textbox with background and outline highlighting. (c) A textbox with outline highlighting only.
A combination of outline highlighting and background highlight were used extensively in the implementation of our system. For example, when a textbox becomes a selection candidate, both outline and background highlighting will occur so that the whole box will be highlighted as well as the outline of the textbox (figure 3.4b). Background highlighting is only useful when the textbox is not filled with a particular colour. If it is coloured, only the outline will be visible, so that the colour of the textbox will not be disturbed (figure 3.4c). All other shapes will have outline highlighting. In our system, a selection candidate object is highlighted with grey, while a selected object is highlighted with yellow.
20
Chapter 4
Interaction Paradigm
As human beings, we tend to use our fingers when we want to express ourselves in such a way as to indicate directions, interest in a particular object, groups of objects or part of an object. A laser pointer is a good option in this case because it provides a way for us to naturally point at something at a distance. Our aim is to provide a more direct technique for interacting with large displays. We will present an interaction paradigm that addresses several challenges encountered in large-display systems. This paradigm will be in the specific context of laser pointer systems, although the concepts are easily applied to general direct interaction techniques for large-displays.
4.1 Challenges Addressed The main challenge is to develop a solution which can be used with general pointing devices and provides a more direct approach for interacting with large screens. In order to develop this solution, we will first examine the challenges we address in current systems.
As mentioned in Chapter 1 the mouse only performs moderately well when scaled to large screens. It reduces the mobility of users, increases their cognitive load and is ergonomically poor, requiring the user to turn their head to look at the large screen. All of these problems lead to undesirable distractions when interacting with the system.
21
In addition to these problems posed by the mouse, we present other issues that the mouse and other pointing devices, such as those mentioned in Chapter 2, still suffer from. These are pointer instability, system response latency and efficient selection. We will now look at each of them in detail.
A potential problem arises since our hands are unsteady [18], making the input device unstable and reducing the accuracy of the cursor position on screen. The problem of hand jitter is significant with the laser pointer because it is held by the user without any support [18]. For example, if a mouse was used instead, users place the mouse on a desktop and only need to move the mouse when required, giving low instability. With the laser pointer, the user needs to hold it still in mid-air without support, which increases the instability. Such jitter results in a slight difference away from the target and is therefore significant [17]. In addition, this jitter increases when the user moves further away from the screen [18]. A moving average filter was proposed by [17] but the accuracy did not improve significantly. We therefore aim to reduce this effect so that the interaction method can be as accurate as possible. It is anticipated that some amount of lag will be involved due to slow system response. Video-based tracking systems typically work between 25 and 50 frames per second. This corresponds to latency between 20 and 40 milliseconds simply from video frame updates. In addition there is the time taken to determine the position of the pointing device and computing the on screen cursor position. All of these cumulate to confuse and disorient the user, since the cursor is not where they expect to be. Most pointing devices require a button for clicking so that items on screen can be selected using Point and Click. However, there remain some pointing devices, such as the laser pointer or the head mouse, that do not provide a button for selecting objects. An obvious solution is to add a button to these devices. However, this type of solution is hardware specific, thus mostly non-transferable between pointing devices. Furthermore, in most cases, a cable is required for such solution to work, causing reduced mobility. Solutions that use some type of wireless technology (such as radio or infrared) tend to be bulky because of the extra hardware and batteries. Increased power requirements also means that the batteries require frequent replacement or recharging. One button in most laser pointers that should not be overlooked is the
22
on/off switch. It is possible to use this switch to act as a button [14]. However as suggested in [18] when the button is released, the beam often moves away from the target before it goes off, therefore it does not provide a good indication of the intended selection. Dwell clicking is a possible candidate for replacing a button click. However, the major disadvantage of this technique incorporated with a laser pointer system is that it takes at least 2 seconds to make a dwell selection [18]. A software solution, on the other hand, provides flexibility and can be applied to a whole range of pointing devices. In light of these issues, we choose to avoid clicking as much as possible and instead provide a paradigm for direct interaction without clicking.
4.2 Proposed Solution We address the above challenges by hiding the on screen cursor and laser pointer, not requiring clicking, and using hotspots and gestures. Each of these will now be discussed in detail.
By not displaying the on-screen cursor, the problems of hand jitter and high latency are hidden from the user. This can also be achieved by having an invisible infrared laser pointer. This is desirable as the user will not be distracted by these, allowing them to focus instead on the on-screen objects.
However, these policies introduce other issues. The most important is how users can select a particular object without an on-screen cursor and without a button for clicking. We solve this through the use of hotspots and gestures to perform selections.
Hotspots are areas around objects which change appearance when the pointer enters them. Such changes include highlighting with a coloured background. This provides a mechanism for objects to be selected without clicking. Technically, the hotspot resides inside a bounding box around an object. When the pointer hits the boundary of this bounding box, the object will be selected (figure 4.1). As a result of the unavoidable lag, the system detects the crossing of the boundary after the pointer actually crosses it. Therefore when the system detects the hotspot activation, the user’s pointer is likely to be near the centre of the object. This is ideal because people tend to point towards the centre of an object, rather than the edges. Thus highlighting 23
object
object
(a)
object
(b)
(c)
object
object
(d)
(e)
Figure 4.1: (a) Initial state where the black arrow represents the position of the pointing device, and is not currently selecting the object. (b) To select the object, the user moves their pointer towards the object. However, due to latency, the system is not yet aware of these changes, and so keeps the original position of the pointer, illustrated with a dotted arrow. (c) As the pointer moves inside the object, the system detects the crossing of the boundary by the laser beam. (d) The system reacts to this crossing and highlights the object, while the pointer approaches the centre of the object. (e) As the pointer leaves the object, the object reverts back to its initial state.
when a hotspot is entered will counteract the effect of the lag. The object reverts to its original appearance when the user points away from it. Gestures are natural movements of the hand (as indicated by the path traced by the pointer) which the system recognises, allowing an action to be performed. Such methods have been used successfully in modern web browsers such as Mozilla and Opera, in the 3D game “Black and White” and in popular 3D modelling programs such as Maya. The idea here is to also use gestures to select objects by circling around the object (figure 4.2).
24
(a)
(b)
(c)
Figure 4.2: (a) Initial state of an object. (b) To select the object, a circling gesture is used. (c) The final state of the object after selecting.
Another possible use of gestures is for navigation. It is natural to move forward to the next piece of information by using a left to right sweeping gesture and move backward by a right to left gesture. A study done by [22] found that this method for navigation is much more efficient than pointing and clicking the back button of a web browser.
25
Chapter 5
Implementation The proposed solution discussed in the previous section is evaluated with an implementation in a presentation environment. We have implemented these ideas as a software add-in to Microsoft PowerPoint called LasIRPoint. In this chapter, we look at the setup of the LasIRPoint as well as how highlighting, gestures and navigation are done in this system. We examine in detail how these features are implemented and written inside PowerPoint. Finally, a discussion on the technical difficulties that we have encountered and limitations imposed.
5.1 System Overview The LasIRPoint uses an infrared tracking device called Smart-Nav developed by NaturalPoint. It is a hands-free infrared mouse replacement device designed to allow computer users with disabilities to control a mouse pointer by moving their head, hand, or fingertip. It works by using an
infrared
camera
to
track
the
movements of small dots of reflective
Figure 5.1: Infrared laser tracking device Smart-Nav, developed by NaturalPoint
material which are illuminated by infrared LEDs. Alternatively, an active source of infrared illumination can be used, such as an infrared LED light. The camera is placed directly in front of the user so that it can
26
sufficiently capture a low powered infrared light held by the user. This technology gives good results at a fraction of the cost of more professional movement tracking systems as discussed in Chapter 2.
The setup is described below. As we are scaling the use of Smart-Nav to capture a large display area, we need to position it so that its field of vision is the same as the image from the data projector. Obviously, the most appropriate place is somewhere close to the projector. Figure 5.2 illustrates this. The larger rectangle represents the data projector
while
the
smaller
Figure 5.2: The setup of the system.
rectangle
represents the Smart-Nav tracking camera. The camera needs to be correctly oriented so that it can capture the entire screen. The user can then point to the four corners of the screen one by one and the camera can calculate the calibration required. Figure 5.3 is a diagram of the overall system including the links between the components. A computer is connected to a data projector which displays an image on the screen (a, b). The camera is attached to the computer via a Universal Serial Bus (USB) cable as shown in (c). The next important equipment that is required is a device that can produce infrared illumination. An infrared laser pointer is needed to produce a bright spot on the display so that the tracking camera can follow its movement (d, e). Most consumer level laser pointer use visible light, rather than infrared. We therefore purchased an infrared laser diode module (figure 5.4) and fitted it inside an
Projector (b) Screen
Camera (e)
(d)
(c)
Laser
Computer
Figure 5.3: A block diagram of the system
27
(a)
appropriate housing for users to hold. The user can then stand anywhere in the room pointing this infrared laser pointer at the screen, with the camera tracking where the user is pointing to.
On the software side, there are three important
features
in
our
Figure 5.4: An infrared laser diode module.
implementation. The first is that objects on
a
PowerPoint
slide
will
be
highlighted temporarily when the pointer moves inside the bounding box of an object, indicating a selection candidate. When the pointer leaves the object, it will revert to its original appearance. This makes use of the hotspot technique as described in Chapter 4. The second is gesture based selection where the user makes a circling action with the laser pointer and the object being circled will be highlighted permanently (even when the pointer leaves), indicating a selection. Selected objects can be unhighlighted by making the same circling gesture again. Finally, the user can move between slides in the slideshow by dwelling on either of two arrows situated in the bottom corners of the display.
5.2 Highlight Each PowerPoint slide consists of user defined set of shapes. A shape can be a title box, textbox, rectangle, circle, image or any other shape supported by PowerPoint. In our system, these shapes are referred to as original shapes. For each original shape, we create a highlight box, initially invisible (that is, 100% transparent). These highlight boxes are needed to perform outline and background highlighting independently of the shape of the object, making it much easier and more consistent to modify the state of the highlights.
There are four highlight states for each shape, which are listed in the following table. The colour and transparency of the shape’s highlight box depends on the state the shape is in.
28
State Highlighted Pointed at 1 False False 2 False True
Highlight Transparent Grey
3
True
False
Light yellow
4
True
True
Yellow
Description original state The shape is being pointed to. (selection candidate) the shape is permanently highlighted but the pointer is not inside the shape (object is selected) the shape is permanently highlighted as well as being pointed to.
The first state is the original state of the highlight boxes, which is transparent. The colour yellow represents that the shape has been permanently highlighted or selected. Therefore states 3 and 4 will contain some tint of yellow. Light colours indicate that the shape is not currently being pointed to, that is, state 1 has a colour of white with 100% transparency and state 3 has a faint yellow. Darker colours indicates that the shape is currently selected, that is, state 2 is grey and state 4 is full yellow.
When the cursor moves into the bounding box of a shape, the highlight box of that shape will be highlighted with grey (figure 5.5). To produce the effect of outline highlighting the highlight boxes are wider and taller than the bounding box (figure 5.5b). (a) (b) (c)
Figure 5.5: (a) A screen shot illustrating the initial state of a rectangle and a textbox. (b) The rectangle is selected. (c) The textbox is selected
5.3 Gestures To recognise the circling gesture, we need a gesture library such as LibStroke [23]. However, the main requirement of such libraries is that the gesture has a clearly defined start and end. In our paradigm, it is impossible to determine these start and end coordinates for the following reasons. First, if the laser pointer is assumed to be
29
switched on at all times, the start and end coordinates are recorded when the laser pointer leaves the camera’s view, which is highly unsatisfactory. Second, if the user can switch the laser pointer on and off at will, we cannot be certain when such an action takes place. It is therefore possible that more than one gesture has been performed during that time. Due to these restrictions we have decided not to use these libraries, but instead develop our own method. The only gesture that needs to be recognised is the circling action, which simplifies our problem.
We propose that such circling action is restricted to within the bounding box of the object of interest. However, intuitively, users tend to make a loop around the outside of an object rather than inside. Therefore, we have created another bounding box named “external box” for each object. This box is significantly larger than the original bounding box and the highlight box by around 20 pixels in width and height (figure 5.6b). With this configuration, they will only be deemed to have left the shape when they have also left the external box, making it easier to perform a successful gesture.
(a)
(b)
Figure 5.6: (a) Initially, a circle gesture must be made inside the bounding box of an object. (b) To make it easier to perform gestures, an external box is implemented, allowing users to go outside the boundary of the object. Initially, we proposed a special rectangular box along each side of the shape (figure 5.7). These boxes are called gesture boxes. A gesture will be deemed to be successful only when all four gesture boxes have been pointed to in order. However, this was found to be too time consuming because it has to test whether the pointer is inside any of the gestures boxes, and especially when the location of these boxes are not related. Also, it was difficult for user to point to all four boxes. To allow easier gesturing, we proposed that these boxes should extend to the corners of the object, as shown in figure 5.8, making a trapezium like shape. It was then realised that the gesture boxes may extend to the centre of the object. We divided the object into four triangular gesture boxes (figure 5.9).
30
Figure 5.7: “gesture boxes” – special rectangular boxes along each edge of an object.
Figure 5.8: trapezium shape like gesture boxes.
Figure 5.9: triangular gestures boxes.
31
However, it is relatively time consuming to determine if the pointer is inside a triangle. Also, we find that the gesture boxes are too easily crossed. For example, if the mouse cursor enters the corner of an object by accident, two gesture boxes will then be selected. To reduce this problem somewhat we turn these triangles into rectangles (figure 5.10).
Figure 5.10: rectangular gesture boxes.
When the user wants to select an object, they point to all four gesture boxes of that object. Furthermore, the pointer must stay within the external box of the shape for the duration of the gesture (figure 5.10a). When successful, the colour of the highlight box will change to yellow, indicating that the shape has been selected (figure 5.11b).
(a)
(b)
Figure 5.11: (a) The circling gesture must stay inside the external box of the object. (b) A successful selection of the object.
32
We have also found that objects can be selected even without a full circular gesture. Figure 5.12(a) shows how this can be achieved. Basically, users only need to make a U shape gesture and all four boxes will be selected. In fact, the gesture stroke can be an almost straight line with an angle just over 180 (figure 5.12b).
180º+
(b)
(a)
Figure 5.12: (a) U shape gesture recognised as being a circle gesture. (b) An extreme case where a line just above 180º can also be recognised as a circle gesture.
To prevent this from happening and only accept correctly circled gestures, we proposed that a series of 6 gestures boxes must be entered in the correct order before a selection or deselection can occur. This also introduces the difference between a clockwise gesture and an anti-clockwise gesture. A gesture can only be completed only when a series of six gesture boxes have been selected in one constant direction. Two examples are given in figure 5.13. With these constraints, the user must make a gesture with at least a 360 degree turn, shown in figure 5.14.
Figure 5.13: Two correct circling gestures.
33
Figure 5.14: An extreme case where a line just above 360º can also be recognised as being a circling gesture.
Users tend to circle more than once to make sure that a gesture is being registered. For this reason, we imposed the restriction that only one gesture can be made while the pointer is inside an object. This is to prevent selecting and unselecting the same object in same action. If the user wants to remove the selection of the same object, they first need to leave the object by moving the pointer away from it, and then return to make the unselecting gesture.
5.4 Navigation Initially, we intended to use sweeping gestures for changing to the next and previous slide. However, these gestures aren’t feasible for the following reasons. Navigation gestures rely on the start and end coordinates; the system will need to see if they lie approximately on a straight line and determine if it spans the width of the display. As a result, a gesture library is required and as mentioned previously this is not easy to achieve. Second, after implementing gesture boxes, we realised that the circling action may interfere with the sweeping action. What happens when a shape is as wide as the display? During a circling gesture, parts of it may appear to be sweeping actions. This causes a large amount of false slide changes. Therefore, we have used dwelling instead, similar to active regions as proposed by [24]. We have added arrows in the bottom corners of the display to allow users to navigate between slides. A right arrow is placed at the bottom right of the display and a left arrow at the bottom left. To move to the next slide, users point to the right arrow and dwell until the system reacts after a set period of time t. Similarly, pointing to the left arrow moves to the previous slide. The arrows highlight in a similar fashion to normal objects.
34
(a)
(c)
(b)
Figure 5.15: Visual cue to emphasise leaving the arrow before change slides again.
An important issue addressed here is that dwell clicking requires the pointer to be inside the bounding box of one of the arrows. After a period of time t the slide is changed but the pointer will remain inside that arrow. If the user does not move the pointer away from the arrow, after time t, the slide will change again. This can cause a series of “rolling changes”, which are clearly undesirable. To prevent this, we have added the restriction that after a slide change, the pointer must first leave the arrow before the slide can change again. This rule is emphasised using visual cues. Figure 5.15(a) shows the state of the arrow before moving the pointer into the arrow. Once inside, the arrow is highlighted giving the user a visual cue that the arrow has been entered as shown in (b). After some time t, a slide change will occur - in this case because the arrow is a right arrow, the next slide will be presented. At this point the pointer will remain inside the arrow, but the arrow is no longer highlighted, indicating to the user that the arrow is not activated. The user is then forced to move away from the arrow before continuing to the next slide.
5.5 An Application – PowerPoint add-in The software implementation has used Visual Basic for Application (VBA) for PowerPoint and the ActiveX tracker camera API supplied by NaturalPoint to control and receive data from the Smart-Nav tracking camera.
We will now examine the actual implementation using VBA. An icon is placed on the toolbar so that the LasIRPoint system can be activated (figure 5.16). Once activated, the user can go to
35
Figure 5.16: icon created in powerpoint
Figure 5.17: A dialog box of setting
any slide and, as per usual, open up a slideshow, at which point LasIRPoint will also start up. Initially a dialog box will appear on top of the slideshow window so that camera settings may be adjusted as required (figure 5.17).
The most important element of this form is the “ObjectTracker Control”, a large black square occupying most of the form. This square is a representation of what the tracker camera observes. When infrared illumination is within the view of the camera and the wavelength of the infrared is higher than 880nm, the camera will track the infrared objects and display them in this black square. The illumination is displayed as a yellow coloured patch, as can be seen in figure 5.17. Some of the most important settings are: “highsize” - the number of pixel the highlight box should extend beyond the bounding box of the objects, “extsize” - the number of pixels the external boxes should extend beyond the highlight box, “frames” - the number of frames for the dwell clicking to activate.
So far, the camera has not yet been switched on. To do so, the user must click on the “open” button with the standard computer mouse. This will select the default camera
36
0, which represents the index of the first and only camera connected to the PC. The tracking camera is now ready to receive illumination data. Whenever infrared illumination is detected, the camera will transmit a string of data back to the program via a callback method. This string contains 3 main elements. It indicates the positions of the edges of the object captured, the area occupied by the object and the x and y coordinates of the weighted center of the object. This information is used to compute the exact location of the cursor in terms of the PowerPoint slide at each frame.
The Smart-Nav device comes with software for emulating a mouse. Rather than using this software we have chosen to process the data stream from the camera directly. This gives us higher flexibility and more freedom in our tasks. Specifically, we can calculate the amount of distortion and correct it where necessary; we can also hide the mouse pointer on screen. Furthermore, we can process data coming from the camera for each frame. This is especially useful for dwelling, since it helps to determine how many frames the pointer has been dwelling for.
5.6 User Cases The following is a storyboard illustrating a usage case of our system. Here our user, Fred will demonstrate the use of highlighting, gestures and navigation using his presentation slides in Microsoft PowerPoint.
1
Fred opened up Microsoft PowerPoint, started running his slide sho, activated Smart-Nav and is now ready to go. This particular slide contains a heading, 3 sentences and a diagram containing a single box. The navigation arrows are located at the bottom left and right corner of the slide. Fred is holding an infrared laser pointer and is currently pointing at the location on the slide with the target marked x. (This mark and the dotted line is invisible and is only shown for illustration purposes.)
Heading sentence 1 sentence 2 sentence 3
37
2 (Now we zoom in to get a closer look at the box.) Fred is about to enter the box with his laser pointer.
3 His laser pointer is currently pointing at the boundary of the box.
4 It now arrives at the center of the box and the box is being highlighted with grey. This indicates that the object is now a selection candidate.
Fred decides that he wants to select this object, so he makes a circling gesture. The box is then highlighted with a solid yellow indicating that it is now selected.
5
6 Fred now moves his laser pointer away from the object. The box is highlighted with faint yellow.
38
7 The laser pointer is now back inside the box and so highlighted with solid yellow again.
8 To deselect the object, Fred does the circling gesture once again and the box returns to its original appearance.
9 Fred now decides that he wants to navigate to the next slide. He moves his laser pointer towards the right arrow.
10
The laser pointer enters the arrow and the arrow is highlighted with grey.
11
1 second later, the system reacts with a slide change. It also removes the highlight on the arrow so that Fred cannot change slide again until his pointer leaves the arrow.
39
5.7 Optimisation The most time consuming calculation is the creation of gesture boxes and highlight boxes. To save time, these are all done at the very beginning of the slideshow. This saves time during the slideshow at the cost of some extra memory. At the end of the slideshow all of the gesture boxes and highlight boxes are deleted so that the slides are returned to their original state.
Another major optimisation is the main loop. The main loop tests the current coordinates of the pointer (x,y) and traverses all the objects on the slide to see which shapes the pointer is inside. This requires a large amount of computation since this loop is done in every frame that is received and it requires testing of all shapes on the slides, including the original shapes, highlight boxes, external boxes and gesture boxes. This is optimised so that only the original shapes are traversed through the use of a shapecollection PowerPoint data structure. At the beginning of each slide, that is, when a slide change occurs or when the slideshow starts, a shapecollection is predetermined which includes only the original shapes. When a frame is received from the camera, the main loop only traverses the shapes in the shapecollection. This greatly increases the speed of processing.
To determine whether the coordinates lie inside a shape, we need four tests, one for each of side of the shape’s boundary box. Short circuiting would be very useful since this would allow it to stop processing the rest of the if statement as soon as a false condition is reached. Unfortunately, Visual Basic does not support short circuiting if statements. We therefore developed our method for improving this computation using nested if statements. The shape is first tested horizontally, then vertically. This is because most slides have textboxes on them, and textboxes are usually wider than they are tall. Since testing horizontally for a shape reduces the area of the search field more than vertically, we test the objects horizontally first (figure 5.18).
40
(a)
(b)
textbox
(c)
textbox
textbox
Figure 5.18: (a) Initial search area (b) Area still to search after testing horizontally. (c) Area still to search after testing vertically.
(a)
textbox
(b)
textbox
(c)
textbox
(d)
textbox
Figure 5.19: The order of if statements.
41
Figure 5.19 shows the order in which the coordinates of an object are tested, and the corresponding code is shown below.
(a) (b)
If y = curShape.Top Then
(c)
If x >= curShape.Left Then
(d)
If x