Feb 10, 2012 ... Microsoft Kinect device is at the moment on top of all hand free recognition
devices. .... Thesis structure . ... Kinect Sensor for Windows .
Master in computer science
USING MICROSOFT KINECT SENSOR TO PERFORM COMMANDS ON VIRTUAL OBJECTS Final Project
2012
Author & Student: Simon Brunner
Supervisors: Denis Lalanne Matthias Schwaller
Final Project Paper
2
02.10.2012
Abstract Gestural user interfaces are becoming omnipresent in our daily lives with more or less success. The Microsoft Kinect device is at the moment on top of all hand free recognition devices. Gestures are mostly designed for leisure applications such as mini-games requiring standing up in large environment, far from the screen. The Kinect sensor device for windows reduces the distance for desktop users with the near mode. This project aims at studying the possibility to develop subtle gestures to perform basic gestural interactions to operate a computer with accuracy. Two different sets of gestures have been designed and compared during the course of this project to work on close range. The goal is to use them in small repetitive tasks and evaluate their performances against each other to see if the one or several types of gestures work better than others. The designed gestures operate four commands that are evaluated with users: selection, drag and drop, rotation and resizing of objects.
The first part of this master project paper presents the technological features and how to go from data acquisition from Kinect to the creation of functional gestures. The following part concerns the design of the gestures. A description on all the designs with their pros and cons is provided in tables showing the evolutions of the gestures. The gestures are divided in two groups: the technological and the iconic ones. On one hand, the technological gestures aim at efficiency and reliable recognition regardless of the users' expectations. On the other hand, the iconic gestures aim to be efficient but priority is given to their naturalness, easiness to remember, and ergonomics for users.
Another important part of this project concerned the creation of a full application for testing with users each gesture in four simple activities and then all together by groups in a final activity. This report ends with the results of a within-subject user evaluation organized with 10 persons and their analysis. Results show that iconic selection has quantitatively equivalent performances as the technological one but is perceived as more comfortable and usable by users. Further, the iconic zoom and rotation have significantly better results according to statistical tests. Finally, iconic gestures are individually better and/or favored by users over the technological gestures, which had similar performances during the final tasks regrouping the four commands.
Final Project Paper
3
02.10.2012
Final Project Paper
4
02.10.2012
Table of Contents Abstract ..............................................................................................................................................3 Table of Contents ................................................................................................................................5 List of Figures ....................................................................................................................................9 List of Tables .................................................................................................................................... 10 List of Graphs ................................................................................................................................... 11 Glossary ........................................................................................................................................... 13 1.
2.
3.
Introduction .............................................................................................................................. 16 1.1
Context .............................................................................................................................. 16
1.2
Goal .................................................................................................................................. 16
1.3
Constraints ........................................................................................................................ 16
1.4
Thesis structure ................................................................................................................. 16
Technology ............................................................................................................................... 20 2.1
Kinect Sensor for Windows ............................................................................................... 20
2.2
The Depth Sensor .............................................................................................................. 21
2.3
The Near mode .................................................................................................................. 22
2.4
Kinect SDK ....................................................................................................................... 22
2.5
Choice of the library .......................................................................................................... 23
2.6
Candescent NUI ................................................................................................................ 23
Using Candescent NUI .............................................................................................................. 26 3.1
Bases to start with Candescent NUI ................................................................................... 26
3.2
Kinect field of view ........................................................................................................... 26
3.3
Hands detection ................................................................................................................. 27
3.4
Fingers detection ............................................................................................................... 27
3.5
Wrong fingers detection ..................................................................................................... 27
3.6
Finger count Stabilizer ....................................................................................................... 27
3.7
Close hands ....................................................................................................................... 28
3.8
Layers and feedback .......................................................................................................... 28
Final Project Paper
5
02.10.2012
4.
Gestures Design ........................................................................................................................ 32 4.1
5.
Context .............................................................................................................................. 32
Gesture recognition Development.............................................................................................. 36 5.1
Iconic gestures 1 ................................................................................................................ 36
5.1.1
Description ................................................................................................................ 36
5.1.2
Technical and Operational Advantages & disadvantages ............................................ 37
5.1.3
Quick Summary ......................................................................................................... 38
5.2
Iconic gestures 2 ................................................................................................................ 39
5.2.1
Description ................................................................................................................ 39
5.2.1
Technical and Operational Advantages & disadvantages ............................................ 39
5.2.2
Quick Summary ......................................................................................................... 41
5.3
Iconic gestures 3 ................................................................................................................ 41
5.3.1
Description ................................................................................................................ 41
5.3.1
Technical and Operational Advantages & disadvantages ............................................ 41
5.3.2
Quick Summary ......................................................................................................... 42
5.4
Iconic gestures 4 ................................................................................................................ 42
5.4.1
Description ................................................................................................................ 42
5.4.2
Technical and Operational Advantages & disadvantages ............................................ 43
5.4.3
Quick Summary ......................................................................................................... 44
5.5
Technologic gestures 1 ...................................................................................................... 44
5.5.1
Description ................................................................................................................ 44
5.5.2
Technical and Operational Advantages & disadvantages ............................................ 45
5.5.3
Quick Summary ......................................................................................................... 46
5.6
Technologic gestures 2 ...................................................................................................... 46
5.6.1
Description ................................................................................................................ 46
5.6.2
Technical and Operational Advantages & disadvantages ............................................ 46
5.6.3
Quick Summary ......................................................................................................... 47
5.7
Technologic gestures 3 ...................................................................................................... 47
5.7.1
Description ................................................................................................................ 47
Final Project Paper
6
02.10.2012
5.7.2
Technical and Operational Advantages & disadvantages ............................................ 47
5.7.3
Quick Summary ......................................................................................................... 48
5.8
6.
7.
Technologic gestures 4 ...................................................................................................... 48
5.8.1
Description ................................................................................................................ 48
5.8.2
Technical and Operational Advantages & disadvantages ............................................ 49
5.8.3
Quick Summary ......................................................................................................... 49
The selected gestures for the evaluation ..................................................................................... 52 6.1
The final choice ................................................................................................................. 52
6.2
The Iconic choice .............................................................................................................. 52
6.3
The technologic choice ...................................................................................................... 52
The test application ................................................................................................................... 54 7.1
The description of the application ...................................................................................... 54
7.2
The Levels ......................................................................................................................... 55
7.2.1
Training ..................................................................................................................... 56
7.2.2
Selection .................................................................................................................... 56
7.2.3
Move ......................................................................................................................... 56
7.2.4
Rotation ..................................................................................................................... 57
7.2.5
Resizing ..................................................................................................................... 57
7.2.6
Final .......................................................................................................................... 57
7.3
Editable levels ................................................................................................................... 58
7.4
The general interface ......................................................................................................... 59
7.4.1
The pointer ................................................................................................................ 59
7.4.2
The commands ........................................................................................................... 60
7.5
The layers .......................................................................................................................... 60
7.6
The objects ........................................................................................................................ 61
7.7
The targets ......................................................................................................................... 61
7.8
The feedbacks.................................................................................................................... 62
7.9
The animations .................................................................................................................. 65
7.10
The measures ..................................................................................................................... 66
Final Project Paper
7
02.10.2012
8.
7.11
The logs ............................................................................................................................ 66
7.12
About the applications ....................................................................................................... 67
7.13
The Class diagram ............................................................................................................. 68
The Evaluation .......................................................................................................................... 70 8.1
Conditions of the evaluation .............................................................................................. 70
8.2
Pre-evaluation ................................................................................................................... 71
8.3
Range of testers ................................................................................................................. 72
8.4
The questionnaire .............................................................................................................. 72
8.5
Results .............................................................................................................................. 73
8.5.1
Selection .................................................................................................................... 73
8.5.2
Move ......................................................................................................................... 77
8.5.3
Rotation ..................................................................................................................... 80
8.5.4
Resizing ..................................................................................................................... 83
8.5.5
Final .......................................................................................................................... 85
8.5.6
Summaries ................................................................................................................. 87
8.5.7
Questionnaire............................................................................................................. 88
8.6 9.
Analysis ............................................................................................................................ 89
Extra applications ...................................................................................................................... 94 9.1
Gesture Factory application ............................................................................................... 94
9.2
Bing Map application ........................................................................................................ 94
10. Conclusions .............................................................................................................................. 96 10.1
General comments over the development ........................................................................... 96
10.2
Conclusion ........................................................................................................................ 96
10.1
Future work ....................................................................................................................... 97
11. References ................................................................................................................................ 99
Final Project Paper
8
02.10.2012
List of Figures Figure 1: Kinect Specs ...................................................................................................................... 20 Figure 2: IR light dots ....................................................................................................................... 20 Figure 4: light coding 2D depth images ............................................................................................. 21 Figure 3: PrimeSensor's objective ..................................................................................................... 21 Figure 5: Near mode vs Default mode ............................................................................................... 22 Figure 6: Candescent NUI ................................................................................................................. 24 Figure 7: Candescent NUI area of detection ...................................................................................... 26 Figure 8: Fingers' order: 1: Red, 2: Blue, 3: Green, 4: Yellow, 5: Pink ............................................... 27 Figure 9: Hands getting too close from each other ............................................................................. 28 Figure 10: Zoom, Rotation & Selection of Iconic 1 ........................................................................... 36 Figure 11: Full circle detections ....................................................................................................... 38 Figure 12: Zoom, Rotation & Selection of Iconic 2 ........................................................................... 39 Figure 13: horizontal hand not detected ............................................................................................. 40 Figure 14: Circle movement detected ................................................................................................ 40 Figure 15: Selection for Iconic gestures 3 .......................................................................................... 41 Figure 16: Zoom, Rotation & Selection of Iconic 4 ........................................................................... 42 Figure 17: rope technique angle example .......................................................................................... 43 Figure 18: Thumb click operations .................................................................................................... 45 Figure 19: Thumb click state diagram ............................................................................................... 45 Figure 20: Zoom technologic 2 ......................................................................................................... 46 Figure 21: Progressive zoom from technologic 3 .............................................................................. 47 Figure 22: Progressive zoom with steps ............................................................................................. 48 Figure 23: Progressive zoom examples.............................................................................................. 48 Figure 24: Zoom, Rotation & Selection of Technologic 4 .................................................................. 48 Figure 25: Chosen gestures for Iconic ............................................................................................... 52 Figure 26: Chosen gestures for Technologic ...................................................................................... 52 Figure 27: Setup windows form ........................................................................................................ 54 Figure 28: Progression of the level path............................................................................................. 55 Figure 29: Panel of the 6 activities .................................................................................................... 55 Figure 30: Appearance order on circle ............................................................................................... 56 Figure 31: move level demonstration................................................................................................. 57 Figure 32: rotation level demonstration ............................................................................................. 57 Figure 33: Resizing level demonstration............................................................................................ 57 Figure 34: Final level demonstration ................................................................................................. 58 Figure 35: xml piece example ........................................................................................................... 59
Final Project Paper
9
02.10.2012
Figure 36: Layers classes structure .................................................................................................... 60 Figure 37: layers' levels structure ...................................................................................................... 60 Figure 38: Demonstration zoom and rotation on a composite object .................................................. 61 Figure 39: Target example ................................................................................................................ 62 Figure 40: Left hand feedbacks ......................................................................................................... 62 Figure 41: detection limits feedbacks ................................................................................................ 63 Figure 42: Feedback of the frame for the detection area ..................................................................... 63 Figure 43: Text & button feedback .................................................................................................... 64 Figure 44: Technologic feedbacks ..................................................................................................... 64 Figure 45: Iconic feedbacks .............................................................................................................. 64 Figure 46: Object's status color feedback ........................................................................................... 65 Figure 47: Object's middle dot feedback ............................................................................................ 65 Figure 48: Application class diagram ................................................................................................ 68 Figure 49: Questionnaires for evaluation ........................................................................................... 72 Figure 50: Index of performance formulas ......................................................................................... 77 Figure 51: Gestures factory & Bing map application ......................................................................... 94
List of Tables Table 1: Selection gestures design ..................................................................................................... 32 Table 2: Zoom gestures design .......................................................................................................... 33 Table 3: Rotation gestures design ...................................................................................................... 34 Table 4: Iconic gestures 1 summary .................................................................................................. 39 Table 5: Iconic gestures 2 summary .................................................................................................. 41 Table 6: Iconic gestures 4 summary .................................................................................................. 44 Table 7: Technologic gestures 1 summary ......................................................................................... 46 Table 8: Technologic gestures 2 summary ......................................................................................... 47 Table 9: Technologic gestures 3 summary ......................................................................................... 48 Table 10: Technologic gestures 4 summary ....................................................................................... 49 Table 11: within subjects experiment progress .................................................................................. 71 Table 12: Selection: times t-test table ................................................................................................ 76 Table 13: Selection: tries t-test table .................................................................................................. 76 Table 14: indexes of difficulty with average throughputs ................................................................... 77 Table 15: Move: times t-test table ..................................................................................................... 80 Table 16: Move: tries t-test table ....................................................................................................... 80 Table 17: Rotation: times t-test table ................................................................................................. 82 Table 18: Resizing: times t-test table ................................................................................................. 85 Final Project Paper
10
02.10.2012
Table 19: Final: times t-test table ...................................................................................................... 87 Table 20: Final: tries and errors t-test table ........................................................................................ 87 Table 21: Activities summaries table ................................................................................................. 88 Table 22: Activities errors summaries table ....................................................................................... 88 Table 23: Questionnaires' results table............................................................................................... 89
List of Graphs Graph 1: Average time for each task from Technologic and Iconic side by side ................................. 74 Graph 2: Selection: average time's comparison by task ...................................................................... 74 Graph 3: Selection: average time's comparison by tester .................................................................... 74 Graph 4: Histogram + Distribution curves + densities........................................................................ 75 Graph 5: Box plot Selection Technologic and Iconic ......................................................................... 76 Graph 6: Selection: number of tries by testers ................................................................................... 76 Graph 7: Average time for each task from Technologic and Iconic side by side ................................. 78 Graph 8: Move average time's comparison by task ............................................................................ 78 Graph 9: Move average time's comparison by tester .......................................................................... 78 Graph 10: Histogram + Distribution curves + densities ...................................................................... 79 Graph 11: Box plot Move Technologic and Iconic ............................................................................ 79 Graph 12: Move: number of tries comparison .................................................................................... 80 Graph 13: Rotation: average time's comparison by task ..................................................................... 80 Graph 14: Rotation average time's comparison by tester .................................................................... 81 Graph 15: Histogram + Distribution curves + densities ...................................................................... 81 Graph 16: Box plot Rotation Technologic and Iconic ........................................................................ 82 Graph 17: Rotation: number of tries comparison ............................................................................... 82 Graph 18: Resizing: average time's comparison by task ..................................................................... 83 Graph 19: Resizing: average time's comparison by tester ................................................................... 83 Graph 20: Histogram + Distribution curves + densities ...................................................................... 84 Graph 21: Box plot Resizing Technologic and Iconic ........................................................................ 84 Graph 22: Resizing: number of tries comparison ............................................................................... 85 Graph 23: All times for each task from Technologic and Iconic side by side ...................................... 85 Graph 24: Final: average time's comparison by tester ........................................................................ 86 Graph 25: Histogram + Distribution curves + densities ...................................................................... 86 Graph 26 Final: number of tries comparison ...................................................................................... 87
Final Project Paper
11
02.10.2012
Final Project Paper
12
02.10.2012
Glossary NUI:
natural user interface
SDK:
software development kit
SoC:
system on a chip
Continuous movement:
the movement of the gesture can be done indefinitely without repositioning to an initial position as long as the command is active. Opposite of limited movement
Limited movement:
movement with a physical limit, like stretching the arm to its maximum. It requires releasing and repositioning to extend unfinished movements.
Progressive movement:
movements based on the position difference. The execution follows the direct movement progressively.
Final Project Paper
13
02.10.2012
Final Project Paper
14
02.10.2012
Chapter 1 Introduction 1.1 1.2 1.3 1.4
Context.......................................................................................... 16 Goal............................................................................................... 16 Constraints..................................................................................... 16 Thesis structure..............................................................................16
This first chapter is an introduction. It treats the context and the reason of the projects. It also gives an overview of how the thesis is structured, what its subjects treated are.
Final Project Paper
15
02.10.2012
1. Introduction 1.1 Context It's no surprise that for a few years now, natural user interfaces have become essential. Phones, tablets, video games systems, TVs, they all use new ways to interact with machines, whether it's tactile, gesture recognition or voice recognition. Kinect offers a new hand free experience by adding depth to video processing. It allows creating new interfaces where the users no longer need to use any device. Kinect has already proven its capability to point at objects with accuracy using only one hand. Nevertheless, the commands provided by Kinect are limited to the selection. Indeed, the selection is done by pointing to an object and wait a few seconds without moving to select it. This technique works well. The pointer remains stable as no extra gesture is required but it limits it to one command only. Adding new commands means bringing new gestures. These gestures alter the stability of the pointer and make it hard to be accurate. The idea is to extend the pointing by adding some commands controlled with the other hand only to keep the steadiness of the pointing.
1.2 Goal The goal of the project is to explore and find multiple ways to perform common tasks on objects like moving, rotating, enlarging, reducing, etc. Microsoft Kinect sensor device is the technology used for this project. The idea is to use gesture recognition to do the manipulations. One hand handles the pointing and the second hand take care of the commands. The focus of the project is on the second hand performing the commands. Evaluation of various alternatives techniques will be done to estimate their efficiencies over each others.
1.3 Constraints We have decided to use the official library Microsoft Kinect SDK. As the focus of the project is on the gestures recognition and not the hands and fingers detection, I was allowed to find and select a library of my choice as long has it takes the kinect advantages. The commands must be executed only with the left hand. The right hand is only on support to point on objects and to move them for the drag and drop, nothing more. No other combinations between the hands are allowed.
1.4 Thesis structure The thesis is divided in several chapters. Each chapter tries to bring the reader from the beginning of the project to the end. First, as already read, the introduction puts the bases of the project with the context, the objectives it tries to achieve and finally the few constraints that needs to be followed. Final Project Paper
16
02.10.2012
Then, the technologies used in this project are described; why they were chosen and how they work. The candescent NUI is the library used to do the gesture recognition. The technologies are presented with their technical aspects. Chapter 3 explains how the Candescent library works and what its qualities and flaws are. Chapter 4 takes care of the design part. Each command is briefly described with a sketch, a text explaining the functioning and the pros and cons of the design from the user's point of view and the developer's point of view. The next chapter treats each developed commands in a more detailed and technical way with problems and solutions found during development. The evaluation can't test all the commands; it would take too much time. A few commands had to be chosen. Chapter 6 tries to justify those choices. The seventh chapter is all about the evaluation application: the interface, the level mechanics, the customization of the tests, etc. Chapter 8 treats the results of the evaluation. First it describes the condition the tests were done, the type of testers being part of the evaluation, the results and the analysis. Finally, the last chapter is about the extra applications that were done during the project. Like the gesture factory application which was used to develop and to pre-test the commands before validation. The Bing map application which was planned to be a test in real condition for the evaluation. And to finish, the last chapter gives insight over the development, what can be improve, changed or removed if the application had to be reused in another project and a conclusion.
Final Project Paper
17
02.10.2012
Final Project Paper
18
02.10.2012
Chapter 2 Technology 2.1
Kinect Sensor for Windows............................................................20
2.2
The Depth Sensor............................................................................21
2.3
The Near mode................................................................................22
2.4
Kinect SDK.....................................................................................22
2.5
Choice of Library............................................................................23
2.6
Candescent NUI..............................................................................23
This second chapter focuses on the technologies used for this project. It provides information on what the kinect device is and how it works. The new features brought by Microsoft Kinect for Windows. It provides a presentation of the library used to get the hand model.
Final Project Paper
19
02.10.2012
2. Technology 2.1 Kinect Sensor for Windows The kinect sensor is a motion sensing device. Its name is a combination of kinetic and connect. It was originally designed as a natural user interface (NUI) for the Microsoft Xbox 360 video game console to create a new control-free experience for the user where there's no more need for an input controller. The user is the controller. It enables the user to interact and control software on the Xbox 360 with gestures recognition and voice recognition. What really differentiates Kinect from other devices is its ability to capture depth. The device is composed of multiple sensors [FIGURE 1]. In the middle it has a RGB camera allowing a resolution up to 1280x960 at 12 images per second. The usual used resolution is 640x480 pixels at 30 images per second maximum for colored video stream as the depth camera has a maximum resolution of 640x480 at 30 frames per second. On the far left of the device, It has the IR light (projector). It projects multiple dots [FIGURE 2] which allows the final camera on the right side, the CMOS depth camera, to compute a 3D environment. For the audio inputs, the device has 2 microphones on each side for voice recognition. The device is mounted with a motorized tilt to adjust the vertical angle. Kinect can detect up to 2 users at the same time and compute their skeletons in 3D with 20 joints representing body junctions like the feet, knees, hips, shoulders, elbows, wrists, head, etc.
Figure 1: Kinect Specs
Figure 2: IR light dots
In February 2012, Microsoft launched the Kinect for Windows. This ''new'' device is designed exactly identical to the one for Xbox 360 to a few exceptions. First it is, like its name tells it, compatible with
Final Project Paper
20
02.10.2012
Windows operating systems and not compatible with Xbox 360 video game systems. Second, it has a new mode called ''near mode'' allowing the user to be closer of the device and still be detected. Indeed, the kinect sensor has a minimum and a maximum distance to detect properly objects [FIGURE 5]. Out of this range the recognition quality decreases drastically. For the original Kinect, the theoretical distance range for a good recognition is from 80cm to 4m. On the kinect for Windows, the range is from 40cm to 3m, a gain of 40cm which is enough to use it on a desk instead of a living room. It is theoretical because in practice the minimum distance is further. Kinect for windows is designed for developers with its Kinect sdk. It is an alternative to the "Kinect hacks" libraries like OpenNI.
2.2 The Depth Sensor The depth sensors, also called PrimeSensor were developed by PrimeSense, an Israeli company. Their main goal is depth measurements [FIGURE 3]. The prevailing technique usually used is the "time to flight" method. It consists into projecting a beam of light and tracks the time it takes for the beam to leave the sensor and return. PrimeSense created a new and cheaper way to do it. It compares the size and spacing between infrared dots to evaluate depth. The PrimeSensor operates a system Figure 3: PrimeSensor's objective
of projected near infrared (IR) light which is read or received from the scene using a standard CMOS image sensor to produce the
640x480 depth image [Citation 1]. The near-IR Light is used to code the scene volume, they call the process "Light Coding". The IR light projects a speckle pattern of dots in the scene [FIGURE 4]. A SoC chip (System on Chip) connected to the CMOS sensor provides further treatments and complex parallel algorithms to decipher the volume scene coding and create a depth image. The projected dots are static. Depending on the scene disposition, the dots are displaced differently from an empty scene. That's make the first image. People might think the second image comes from one of the other camera. It is wrong. Only one camera, the CMOS depth camera is used for depth estimation. So it still needs a second image to do stereo triangulation to compare with the first image. PrimeSensor has an embedded image in the memory of a virtual plane pattern as reference. So with two images, the stereo triangulation process compares the differences between the original pattern and the one observed to determine the depth estimation.
Figure 4: light coding 2D depth images
Final Project Paper
21
02.10.2012
2.3 The Near mode The near mode is the actual real improvement over the original Kinect. It comes from demands of the developers to Microsoft. Pc-based applications often require Kinect to focus on closer range than Xbox applications. Microsoft provides the Kinect for Windows with the ability to see objects from 400mm instead of 800mm previously. The DepthRange has to be set in the application, the near mode is enabled or not. As for the kinect SDK 1.00, the near mode can only detect 2 objects. The skeleton tracking isn't available in near mode. Nevertheless, the skeleton tracking is supposed to be available later for the 1.5 version of the SDK with the possibility to recognize the upper side of the body (10 joints) to use it at a desk. Two additional features for the depth range are also available. The MinDepth and MaxDepth describe the kinect depth range boundaries.
Figure 5: Near mode vs Default mode
2.4 Kinect SDK From Microsoft Research labs, the Kinect development kit for Windows 7 allows developers to create their own interfaces and applications for windows. It was designed to be used with C++ and C#. The paper only focuses on C# with Microsoft Visual Studio. Download and install the latest SDK from www.microsoft.com is all what's needed. "Kinect SDK offers raw sensor stream to access to low-level streams from the depth sensor, color camera sensor, and four-element microphone array. Skeletal tracking: The capability to track the skeleton image of one or two people moving within the Kinect field of view for gesture-driven applications. Advanced audio capabilities: Audio processing capabilities include sophisticated acoustic noise suppression and echo cancellation, beam formation to identify the current sound source, and integration with the Windows speech recognition API"1. To create an application with Visual Studio using Kinect, just add a reference to Microsoft.Kinect in .Net and using Microsoft.Kinect;. For samples and examples of what can be done with the sdk, there is the "Kinect SDK sample Browser". 1
http://en.wikipedia.org/wiki/Kinect
Final Project Paper
22
02.10.2012
2.5 Choice of the library The goal of the project is to design and develop hand gestures, not actually implements a hands and fingers detection and representation model. So, one of the first task was to find a library or an open project providing information on hands and fingers tracking. As the Microsoft Kinect SDK had just been released for a month or so at the beginning of the project the choice between libraries was really small. Lots of existing projects provides hand tracking. Unfortunately, most of them use openNI or NITE on Linux instead of the kinect SDK and secondly they use the skeleton tracking engine to follow the hand's gestures. That means one point or joint in space and no finger tracking. This makes it impossible to use for hand postures using fingers. Open sources like "Kinect Paint", "kinect toolbox" on codePlex.com were considered but were rejected as the only use the skeleton tracking. The "Kinect SDK Dynamic Time Wrapping (DTW) Gesture Recognition" project 2 offer the possibility to record gestures in 2D and recognize them but only works with the skeleton joints. Then there was the "Tracking the articulated motion of two strongly interacting hands" project from the University of Crete "It proposes a method that relies on markerless visual observations to track the full articulation of two hands that interact with each-other in a complex, unconstrained manner"3. The project was really interesting and promising but a source code wasn't available except a demo 4 of the project and of course it was using openNI. Finally, a project stood out, the Candescent NUI project 5. It provides full hand recognition with finger points, palm, depth, volume, etc. The open project was using the openNI library but had just been updated to the Kinect sdk. The candescent library offers everything this project need to be developed so it was selected as the base for the recognition.
2.6 Candescent NUI Candescent NUI is a set of libraries created by Stefan Stegmueller. It is designed for hand and fingers tracking using Kinect depth data. It has been developed in C# with OpenNI and Microsoft Kinect SDK. The creator allows developers to use the libraries as long the copyright remains in the project [see appendix].
Candescent NUI provides lots of useful information for hand and fingers tracking. It starts by detecting close objects, two at most. These objects are then treated to get hands features. Careful, if an object is actually not a hand, like a head for example, the algorithms will slow down the application because they won't be able to extract the features and might probably crash. If the objects are hands, 2
http://kinectdtw.codeplex.com http://www.ics.forth.gr/~argyros/research/twohands.htm 4 http://cvrlcode.ics.forth.gr/handtracking/ 5 http://candescentnui.codeplex.com 3
Final Project Paper
23
02.10.2012
features will be extracted. A convex hull algorithm gives the finger tips position (X, Y, Z), direction, etc. Other features like the volume of the hand, the palm position, the number of fingers, each finger's base position and id.
Figure 6: Candescent NUI
Final Project Paper
24
02.10.2012
Chapter 3 Using Candescent NUI 3.1
Bases to start with Candescent NUI............................................. 26
3.2
Kinect field of view...................................................................... 26
3.3
Hands detection............................................................................ 27
3.4
Fingers detection........................................................................... 27
3.5
Wrong finger detection................................................................. 27
3.6
Finger count stabilizer.................................................................. 27
3.7
Close hands................................................................................... 28
3.8
Layers and feedbacks.................................................................... 28
This chapter is about how the Candescent library works. It gives insight of how to start a project using it, the conditions to make the recognition work properly. It also gives tips on what needs special attention. How the data can be displayed onscreen with feedbacks.
Final Project Paper
25
02.10.2012
3. Using Candescent NUI 3.1 Bases to start with Candescent NUI To be able to use the Cansdescent library the project must add references of the library's dll. There are four necessary dll needed: CCT.NUI.Core, CCT.NUI.HandTracking, CCT.NUI.KinectSDK and CCT.NUI.Visual. They need to be added in the references of the project. To use Candescent with the Kinect SDK instead of open NI as it also can, the data source must be configured properly. To use the SDK : IDataSourceFactory dataSourceFactory = new SDKDataSourceFactory();
Then create and start the thread of the hand data source. Create a hand data source var handDataSource = new HandDataSource(dataSourceFactory.CreateShapeDataSource()); handDataSource.Start();
3.2 Kinect field of view Kinect for Windows detects objects at 400mm in near mode. With Candescent NUI hands have to be at least at 500mm (minimum depth) from the camera to be detected correctly. The maximum distance is 800mm, further than this the hands are too far to be detected and tracked [FIGURE 7]. The perfect distance for the tracking is around 650mm. The user has around 600 to 850mm to move his hands horizontally and around 400mm to 500mm vertically. The minimum and maximum depth distance can be set in the ClusterDataSourceSettings Class. I've tried to change those values to get more volume but it resulted in a worst situation. A smaller minimum depth distance under 500mm doesn't improve the tracking and a larger maximum depth distance makes the detection disastrous. It becomes hard to have two hands detected at the same time because it now takes more attention to further object like the head and shoulders. 500mm to 800mm is the best range distance for hands tacking without having too much trouble and having to move the chair back and getting far from the screen.
Figure 7: Candescent NUI area of detection
Final Project Paper
26
02.10.2012
3.3 Hands detection Candescent NUI can detect up to 2 hands simultaneously. There are stored in a list of "HandData" "dataSource.CurrentValue". If two hands are detected, Hand[0] always will be the hand on the right side and Hand[1] the one on the left side. Each hand has an id, a location, a volume, a palm, fingers, a contour shape and a convex hull as main data.
3.4 Fingers detection The way the fingers are detected is special. There's no notion of thumb, little finger or middle finger. The fingers are numbered from 0 to 4 for each hand. Actually a hand can have more than 5 fingers with this library as the fingers are summits of the convex hull algorithm. The first finger is always the highest one and then the other fingers are ordered clockwise. That means if the hand rotates, the first finger and the others change position [FIGURE 8], there's no lock mode on fingers. The fingers data are in a List "FingerPoints". There is also FingerCount which return the exact number of fingers at the moment.
Figure 8: Fingers' order: 1: Red, 2: Blue, 3: Green, 4: Yellow, 5: Pink
3.5 Wrong fingers detection Like written before, the fingers are defined by the summits of the convex hull algorithm. Summits are usually above the palm of the hand. But sometimes those summits are under it, at the wrist level which makes false finger detections. To prevent that the "detectFingers" method take into account only fingers detected above the palm of the hand. This takes away the possibility of having the thumb beneath the palm or having the hand downward but those options are rarely used.
3.6 Finger count Stabilizer As the finger detection isn't perfect, sometimes some fingers disappear for one frame. There are several reasons for that like the hands are too close or too far from the camera, the fingers are too close Final Project Paper
27
02.10.2012
from each other to differentiate them, the fingers are too thin or other kind of perturbations. This problem can cause lots of trouble. For example, if a gesture needs a precise number of fingers to stay active, it won't be stable and it can make the experience painful. This is why it needs to minimize the perturbations with a stabilizer of finger. It is quite simple, It need an circular buffer with past finger counts and extract the most common value, not an average, with the method "mostCommonValue" using a Dictionary. The technique is really effective but it comes with its burden. Indeed, the finger detection become less reactive as a post-treatment of the most common value is done. The bigger the array is, the more stable it becomes but the less reactive. Too small, would make the method useless. So, it needs an array size of about 20-25 which is enough. Then, there's a trick to regain some reactivity in certain situations. Just keep the "real" current number of fingers in another attributes and use it carefully when it is possible, otherwise it will demolish the stabilizer work.
3.7 Close hands Candescent NUI is able to manage two hands at the same time. But those hands can't be too close from one another otherwise they will be considered as one giant hand [FIGURE 9]. This is why, applications using this library need to be careful and avoid at maximum hand crossing or getting them near each other.
Figure 9: Hands getting too close from each other
3.8 Layers and feedback The layers are the container in which you can draw with the System.Drawing.Graphics of C#. There's no need to use one but they offer the possibility to display useful feedback at least in the development process. The feedbacks goes from the hand contour, finger positions, palm position or any unwanted detected form contour like the head, shoulders or arms. The object to add on the form is VideoControl from the CCT.NUI.VISUAL.VideoControl class. The capture resolution is 600x380. Of course the VideoControl size can be larger but the feedbacks will always be within the 600x380 first pixels. To upscale or translate the feedbacks a pre-transformation needs to be done. g.TranslateTransform(translationValueX, translationValueY); g.ScaleTransform(scaleValueX, scaleValueY);
Final Project Paper
28
02.10.2012
The layers are inheritances of the abstract class LayerBase which must implements the ILayer interface. The interface allows switching quickly from a layer to another. private ILayer layer; this.layer = new Layer1 (dataSource); this.layer = new Layer2 (dataSource);
Layers have to implements the Paint method. This method draws every element in the VideoControl for each frame. The first painted elements are in the back and the lasts in the front in case of overlapping. So, the order of drawing elements must be quickly consider. The hand feedback can display lots of information. This information isn't always useful and can take too much place on the screen. So it is good to disable them when they are not needed. this.ShowConvexHull = false; this.ShowContour = true; this.ShowFingerDepth = false; //DrawFingerPoints(hand, g); //this.DrawCenter(hand, g);
Final Project Paper
29
02.10.2012
Final Project Paper
30
02.10.2012
Chapter 4 Gestures Design 4.1
Context.......................................................................................... 32
4.2
Selections & Release..................................................................... 32
4.3
Rotations........................................................................................33
4.4
Zooms............................................................................................ 34
This chapter is about design. It is a quick summary of all the gestures designed during the projects. Each one is described briefly. For more specified information, chapter 5 is the place.
Final Project Paper
31
02.10.2012
4. Gestures Design 4.1 Context This section details each designed gesture during the project. They are distributed in three tables corresponding to the command they are designed for. A design is presented in three parts. First there is its name which tries to describe the gesture itself as much as it can. It comes along with a graphical sketch showing the mechanics of the movement. Second part is a text description on how is executed the gesture. The third part is a list of advantages and disadvantages of using and developing the gesture. The order the gestures are presented is to show the evolution in the design.
Selections & Release Thumb click
Finger pinch
Hand grab
Push down repeated
Push down & hold
Push down & hold 2
Description
Pros & Cons + simple to use. The thumb of the left hand + easy to detect by the camera. must do a little "click" by - slow desappearing and reappearing. - identify the thumb - fatiguing. Description Pros & Cons + natural, easy and not tiring The thumb and the index must touch each other to activate the - the hand must a little on the side to selection and hold the position. clearly detect the fingers. Description
Pros & Cons + easy to use, very easy to detect. Close the hand like it grabs the + accurate, not tiring on the short term. object and then release it by opening the hand again - can be tiring on the long term. Description Horizontal hand posture. Sudden vertical downward movement directly followed by the counter sudden upward movement. Description Horizontal hand posture. Sudden vertical downward movement and hold position to stay active. Upward movement to release the command. Description Vertical downward movement to an activation area and hold position to stay active. Upward movement out of the area to release the command.
Pros & Cons + light and natural movement - hard to detect and implement (swipes) - hard to single out from other moves - horizontal posture is hard to detect Pros & Cons + light and natural movement + less movement than previously - horizontal posture hard to detect - hard to detect and implement (swipes) - hard to single out from other moves Pros & Cons + no posture needed, position prevails. + easy to use, accurate. + easier to implements - less natural - less flexible (fixed area for detection)
Table 1: Selection gestures design
Final Project Paper
32
02.10.2012
Zooms Vertical slide
Finger spread
Continuous vertical slide
Depth slide alternative
Depth slide continuous
Depth slide progressive
Circles detection with vertical posture
Rope technique Step by step
Description
Pros & Cons + easy to use and implement
Sliding vertically with two fingers. Upward to zoom in and - not very accurate downward to zoom out. - limited movement → repositioning Description Pros & Cons + natural and intuitive Take the distance between index - very hard to use and thumb. If the distance - not enough space for accuracy increases, it zooms in. If it - limited movement, hard repositioning decreases, it zooms out. - release of command difficult Description Pros & Cons The zoom is activated when a + easy to use minimum vertical distance from + continuous gesture → no repositioning the reference point is reach. The + variable speed further , the faster. The zoom is + neutral zone continuous until the finger is back in the neutral zone. - learn to use the variable speed Description Pros & Cons + easy to use and implement Two activation areas, one in front + no initial point to zoom-in and one in the back to + continuous gesture → no repositioning zoom out. Zooms are continuous, - not flexible no variable speed. - not natural (pushing & pulling) Description Pros & Cons + easy to use Move forward to zoom in and + continuous gesture → no repositioning backward to zoom out. the + variable speed distance from the initial position - learn to use the variable speed sets the incremental speed. - hard to be accurate Description Pros & Cons + feels natural Move forward to zoom in and + easy to use backward to zoom out. The - easy to be accurate distance from the initial position - limited movement → repositioning sets directly the zoom value. - less flexible due steps Description Pros & Cons + natural gesture, easy posture Hand in vertical position. + no initial point Circlular movements are detected. + continuous gesture → no repositioning Clockwise to zoom in and counter-clockwise to zoom out. - hard to detect circles (true-negative) - vertical posture hard to detect Description Pros & Cons + Easy to use and feels natural Measure the angle between the + No need of circle detection reference point and the finger. A + continuous gesture → no repositioning zoom is done if the difference + accurate and immadiate results between the actual angle and the last one is big enough. - positioning around initial point Table 2: Zoom gestures design
Final Project Paper
33
02.10.2012
Rotations Vertical slide
Horizontal slide
Circles detection
Description Sliding vertically with three fingers. Upward to rotate clockwise and downward to rotate counter-clockwise. Description The zoom is activated when a minimum vertical distance from the reference point is reach. The further , the faster. The zoom is continuous until the finger is back in the neutral zone. Description Circlular movements are detected. Clockwise for a positive rotation and counter-clockwise for a negative one.
Circles detection with palm posture
Rope technique Absolute angle
Rope technique Step by step
Description Open hand facing the camera. Circlular movements are detected. Clockwise for a positive rotation and counter-clockwise for a negative one. Description Measure the angle between the reference point and the finger. The object rotates directly to the absolute angle value. Description Measure the angle between the reference point and the finger. A rotation is done if the difference between the actual angle and the last one is big enough.
Pros & Cons + easy to use and implement - not very accurate - limited movement → repositioning Pros & Cons + easy to use + continuous gesture → no repositioning + variable speed + neutral zone - learn to use the variable speed Pros & Cons + natural gesture, no posture + no initial point +continuous gesture → no repositioning - hard to detect circles (true-negative) - conflicts with other movements Pros & Cons + natural gesture, easy posture + no initial point + no more conflicts due to the posture + continuous gesture → no repositioning - hard to detect circles (true-negative) Pros & Cons + No need of circle detection + continuous gesture → no repositioning + objects follow the finger + accurate and immadiate results - positioning around initial point - very hard to use without experience Pros & Cons + Easy to use and feels natural + No need of circle detection + continuous gesture → no repositioning + accurate and immadiate results - positioning around initial point
Table 3: Rotation gestures design
Final Project Paper
34
02.10.2012
Chapter 5 Gesture Recognition Development 5.1
Iconic gesture 1............................................................................. 36
5.2
Iconic gesture 2............................................................................. 39
5.3
Iconic gesture 3............................................................................. 41
5.4
Iconic gesture 4............................................................................. 42
5.5
Technologic gesture 1................................................................... 44
5.6
Technologic gesture 2................................................................... 46
5.7
Technologic gesture 3................................................................... 47
5.8
Technologic gesture 4................................................................... 48
The gestures are fully described with their pros, cons and insights. The gestures are divided into groups. Each group describes some commands, one of each type maximum. The order of presentation tries to follow the evolution of a gesture like in chapter 4 when it is possible.
Final Project Paper
35
02.10.2012
5. Gesture recognition Development In order to create efficient natural user interfaces the idea is to design multiple gesture recognitions for the three tested commands; rotation, zoom and selection. Once a gesture is operational, it can be tested to reveal its pros and cons. Then it is adapted, improved or dropped. The gestures of each command are put together in groups of three, one for each command, according to their similarities when possible. Some groups are improvements of old ones. Not all groups have unique types of gesture recognition, some are reused. For the final evaluation with real testers, a selection of two groups of the most suited gestures for each task is chosen. The reason is that the evaluation takes quite some times to go through and the testers can't be hold forever. There are two types of gestures. The Iconic ones, supposed to be closer to natural human gestures, come more easily to a user's mind to accomplish a task with a certain command. The second group is the technologic gestures. Those are closer to the machine side. They are less natural but easier to process. Most of the time, gestures can be divided in three parts. First is the activation. This is often a posture of the hand (vertical, horizontal, etc) or a definite number of fingers. It tells the machine which command has to be process. The second part is the execution, which is some kind of movement. Some movements have to keep the activation posture to stay active all along, some others don't have to. It usually depends on the nature of the gesture if it can be confounded with other gestures or not. The third part is the release, usually used to avoid the activation of another gesture during execution. It is also commonly a posture or definite number of fingers.
5.1 Iconic gestures 1 5.1.1 Description The idea is to find natural gestures that would come intuitively to mind and see if those gestures can be efficient in practice or not. The tactile screen gestures were used as inspiration. For the zoom, two fingers, the thumb and the index finger, approach or distance themselves. The bigger the distance is, the bigger the zoom is and vice versa. For the rotation: a circular gesture. It detects if a circle is created by the user's hand movement. If it is a fixed angle rotation on the object is done. The Selection is done by small vertical swipe, a quick movement of the hand, in the horizontal position, downward, like pushing a button. The gesture has to be done once to select and a second time to unselect.
Figure 10: Zoom, Rotation & Selection of Iconic 1
Final Project Paper
36
02.10.2012
5.1.2 Technical and Operational Advantages & disadvantages Zoom:
The implementation is easy. It just computes the distance between the thumb and the index finger. The difficulty is to identify the actual thumb. The "findThumb" method takes of care of it. Or it just can take account of the two only fingers detected. Once the two fingers positions are known, the distance between them is the reference. If it growths bigger, it zooms in and if it reduces, it zooms out. The problem is that it is a finite movement. When the fingers touch each other or are spread to their maximum, it's over. To zoom in or out more, the fingers have to be reinitialized. That's where it gets tricky. The user just can't detach its fingers or it will zoom in and will lose all it has done yet. On tactile screen it is easy to do, it just needs to stop touching the screen and restart. On camera, the user has to release the gesture, either by getting off camera or by having a special posture that can't be mistaken for a zoom in or zoom out gesture. In practice, the fact of releasing the gesture is really painful and frustrating. Otherwise, it is hard to be precise because the sensitivity is high due to the short maximum distance between fingers. The sensitivity can be reduced by slowing down the speed of zoom but then it will take more tries to reach the desired size.
Rotation: To do a rotation the left hand has to "draw" a full circle. A full circle detected rotates an object of a fixed value of degrees each time. Points of the hand's actual and past positions are required to recognize a path. The points are stored in a circular buffer. To detect a circle, each point is taken into account. It computes the centroid, the center of mass which is the center of the circle. Then it computes the average distance between the centroid and each point, to create the average radius of a perfect circle. Finally, it checks that more than 80% of the points are near the radius (+/- 10%) [FIGURE 11]. If it does, a rotation of a fixed value of degrees is performed. To know if the rotation has to be done clockwise or counter-clockwise, the computation of the centroid comes in handy. To find the centroid, the "findCentroid" method computes a polygon area of the points. If that area is positive then the rotation has to be clockwise and vice versa. The Problems with this method are multiple. First it takes a lot of points to detect a full circle second users aren't always making circles, so the path of the points unpredictable. The algorithm has to find a full circle in a cloud of points. But the more points there are, the harder it gets to detect a full circle because there are easily more than 20% of them out of the perfect circle range [FIGURE 11]. So the user has to do at least two full circles to align all points in the theoretical circle. It also takes lots of full detected circle to do a big rotation as every detection rotates one little step only. Finally, it's takes longer to detect full circles if the path is longer. In practice, it is pretty hard to draw a correct circle in the air. Most of the time, it looks more like ellipses which make it difficult to detect. It makes this technique Final Project Paper
37
02.10.2012
hard to use. Another issue has to be considered. The first full circle is hard to get but then each new points located in the circle makes a full circle along as 80% of the points are in the circle. So, when the gesture is over the object will continue to rotate until at least 20% of the points are out of the circle which making it hard to be accurate, hence the need of a smaller buffer to reduce this effect.
Figure 11: Full circle detections
Selection: Using swipe gesture is more complicated to implements than it would appear. First it needs a circular buffer of past positions. Then it needs to find stability in the movement to be sure that is not interfering with another gesture. The stability is relative as the hand always shakes a little so it needs to be flexible. If the signal is stable for awhile, the swipes can be identified. To detect a vertical swipe it checks in the points' history an important vertical difference between the points. The results are mixed. It works but not always when it should. There are several problems. First the stability of movement is hard to manage, if it is too flexible the system is always stable and will detect too much swipes, in opposition is it's not flexible it won't try to detect any swipes. Then swipes themselves, there are hard to detect. It can try to detect the slope of the points to deduce the vertical speeds and accelerations. It's hard to make it right. In practice the selection is unsatisfying due to is randomness. 5.1.3 Quick Summary
Zoom Activation:
Execution:
Release:
2 fingers
Spread the fingers
anything but 2 fingers
Activation:
Execution:
Release:
none
Make circles
Stop making circles
Rotation
Selection
Final Project Paper
38
02.10.2012
Select:
Unselect:
Notes:
Swipe down the
Repeat movement
The selection stays on until a second swipe
hand
Swipe down the hand
down is detected.
Table 4: Iconic gestures 1 summary
5.2 Iconic gestures 2 5.2.1 Description As the zoom wasn't a success in iconic gestures 1, this version tries to improve it. Gestures with limited movements, like spreading the fingers or stretching out the arm, always bring trouble when it comes to getting back to the initial position. Therefore this time, it uses a continuous gesture like the previous rotation with a circular movement. It is still based on the circle detection but an improved version. As the movements are similar, the postures of the hand differentiate the commands. A vertical hand on the side (slim) for the zoom, a open hand (large) with palm facing the camera for the rotation and a horizontal hand (palm facing the ground) for the selection. The Selection is an alternate version of the previous one. Instead of having to repeat the gesture twice: once for selecting and once again for unselecting. This iteration breaks down the gesture in half. A quick small swipe down, still with the hand in a horizontal posture, and stay down activates the selection. The selection stays active as long as the hand holds in its down position. To release the selection, the hand has to quit its down position by moving up.
Figure 12: Zoom, Rotation & Selection of Iconic 2
5.2.1 Technical and Operational Advantages & disadvantages Posture recognition:
Candescent NUI provides useful data on the volume of the hand. Indeed the
width and the height are easy to get. Then a few easy tests based on thresholds establish the hand posture. The all point of this technique is to pick the right thresholds. That's where it gets delicate. Overall the method is pretty accurate, but there are a few flaws. First the vertical posture based on a great height and a thin width, if the thumb gets too much on side, the width increases too. If it crosses the width threshold, it is wrongly considered as an open flat hand. Also depending on the depth position of the hand, the values are way smaller when the hand is far and bigger when it is close from the camera. For the horizontal posture, the problem stands in the wrist. It is easy to check if the height is small because it is way smaller than the other
Final Project Paper
39
02.10.2012
postures. The problem is sometimes the hand is so horizontal it is not detected anymore and the wrist is taken for the hand [FIGURE 13] and return false results.
Figure 13: horizontal hand not detected
Circle Detection: As the first try to detect circle wasn't a success, the second attempt is more permissive. As the user is always moving his hand in the same area an average centroid is calculated. Like before, it takes all the past position in the circular buffer to compute the actual centroid and put it in an array of a few past centroids which will determine the current average centroid. This method stabilizes the centroid point and avoids radical changes from bad point's detections. This also helps reducing the number of points needed for the circle detection as the centroid is less volatile and thereby takes less time to compute. Full circle detections were demanding and exhausting. To make the detection faster, only a part of a circle is needed. By reducing the number of points, the path of the past points is closer to the actual activity of the hand. When a circular movement starts it is almost instantly recognized. The problem is stopping the rotation after the actual movement. It is still here but it has been reduced by the fact the number of points is smaller. In practice the detection of circles is easy and fast, but it sometimes, rarely but still, erroneously detects circles where it shouldn't. To reduce this issue, increase the percentage of points needed in the circle, but it will make the detection stiffer.
Figure 14: Circle movement detected
Selection: A second version of the selection with a movement downward. This time, the user has to do a small swipe down, hold it down and finally swipe up to release. This brings the same problems as before and adds more. It needs to check the stability of the hand for the beginning of the gesture and once again at the end to ensure the movement is finished, so can't be mistaken for another movement and as a base for the swipe up to release the selection. As the hand movement is hard to be stable, the whole technique is just unstable in the end.
Final Project Paper
40
02.10.2012
5.2.2
Quick Summary
Zoom Activation:
Execution:
Release:
Make circles
Stop making circles
Activation:
Execution:
Release:
Full hand:
Make circles
Stop making circles
Select:
Unselect:
Notes:
Swipe down
Move the hand back
The selection stays on until the down
and hold down
up
position is quit.
Vertical hand on the side
Rotation
palm facing camera
Selection
position Table 5: Iconic gestures 2 summary
5.3 Iconic gestures 3 5.3.1 Description The two last selection gestures using swipes weren't successful. This version keeps the idea of pushing down the hand to do a selection and pushing up to release it. But this time, instead of using swipe movements, it uses a fixed limit. To do the selection the palm of the hand just need to go under the limit. It stays selected as long as it stays in the zone and doesn't cross the limit again.
Figure 15: Selection for Iconic gestures 3
5.3.1 Technical and Operational Advantages & disadvantages Selection: The implementation is easier compared to the other version. It requires a threshold to delimit the area of activation. The threshold is pushed a little bit up when crossed to avoid instability if a user stops just after the limits the risk crossing back involuntary due to the shakiness of the hand. Once the threshold is crossed again in the other way, it gets back to its normal height. The threshold checks the hand current position (the green dot) as indicator. In practice it works fine, it less demanding than the previous version and it is more robust and Final Project Paper
41
02.10.2012
reliable. On the downside the system has to be careful that the selection can't be activated while another command is on if the hand position gets in the activation area. To avoid that, there are some locks along with gesture recognitions. 5.3.2 Quick Summary
Selection Select:
Unselect:
Notes:
move down the
Move up the hand
The selection stays on until the down
hand in the
from the blue zone
position has quit the zone.
blue zone
5.4 Iconic gestures 4 5.4.1 Description The idea of making circles as it is a continuous movement is kept again. However, this iteration rebuilt from the ground up the method of detection. This time no centroid is calculated. A reference point is fixed at the activation of the commands zoom or rotation. Once the reference point is fixed, the user just needs to move his hand around it. Like a rope attached to a post it will turn around automatically, hence the name "rope technique". According to the hand's position and the reference point, it computes the actual angle. The user can gain precision as he moves away from the reference point. On the other side, with a short rope (distance between the reference point and the finger) it gets harder to be precise. So a minimum radius is required to work. For the zoom, activation is done by showing two fingers only to the camera. Then, the user needs to turn around the reference point clockwise to zoom in and counter-clockwise to zoom out. The Rotation is done naturally. One finger to activate it and then, like the zoom, turn around the reference point. It rotates the object of the actual absolute angle. That means the object follows directly the hand position. This gives complete and direct control of the angle of the objects. The selection is done by closing the hand and hold. It represents like grabbing something and holding it to then releasing it by dropping it.
Figure 16: Zoom, Rotation & Selection of Iconic 4
Final Project Paper
42
02.10.2012
5.4.2 Technical and Operational Advantages & disadvantages The rope technique: First it needs a reference position. A fixed point in the middle of the detection area has been consider and tested. Unfortunately, it causes trouble as it demands more efforts and larger movements to execute the command. Detection wise the problems come when the user approaches the limits of the detection area. So a mobile reference point is used. This position is created once every time the command is activated and removed at the end when the command is released. To know exactly the angle between the reference and the actual position a simple trigonometry formula comes in handy. First get the distance between the two points to determine the circle radius. Then extract the position of the point at zero degree, which is the reference point plus the radius on the x-axis. Now, compute the distance between the new point and the actual point (a). Finally compute the angle with the law of cosines [FIGURE 17]. Transform the angle from radian to degrees and replace negative angles by their positive equivalent to have angles from 0 to 359. This technique is simple, quick and accurate.
Figure 17: rope technique angle example
Zoom: The zoom uses the rope technique. It gets the angle but doesn't use it directly. It actually compares it with the past used angle. If the difference between them is larger than the predetermined margin, the zoom is carried on. If the difference is positive it zooms in and vice versa. The margin, also called "zoomStep", provides smoothness and accuracy. Without it the zoom would have been too sensitive and very difficult to control. The past angle must only be stored when the angle has crossed the margin limits. Large zoomSteps will bring accuracy and avoid shakiness problems but it will be slower to operate. In practice, once activated, the zoom is easy to use and feels natural with good precision. But learning how to use it can be hard due to the hand placement which is pretty demanding.
Rotation: it takes simply the angle value from the rope technique and transmits it directly to the object to rotate it. Actually it works very well but it's too precise. Any small movement is perceive and changes the angle, making it hard to reach with exactitude the target angle. To prevent that issue, the idea is to reduce the accuracy of the method. By adding an "angleStep", which is nothing more than a denominator, the precision is delimited. For example with an angleStep of 10, the output angle can only be a multiple of 10 and give 36 degree of freedom to the object. So it gives more room to move between steps and avoid the shakiness syndrome. On the other side, it loses all the angles in between. In practice, the rotation requires a little adaptation
Final Project Paper
43
02.10.2012
because result of the movement is immediate and the user has to learn to use it properly. It takes quite some time to acquire the necessary skills for some users. Once learned, it is the quickest way to rotate an object. Just point the angle wanted and it instantly follows.
Selection: The idea of grabbing and holding an object to move it feels natural. On the technical side, it is quite simple to implements, if the hand is present hand has no fingers (fist), it assumes the hand is closed and it activates the selection until fingers appears. It also is easy to detect which makes it reliable and robust. In practice, it performs very well. The only downside is the physical efforts it demands on the long term. It takes not inconsiderable efforts to make a fist. However, if the user has often to repeat the gesture over and over again, his hand will be exhausted. On the short term, it is super efficient. 5.4.3 Quick Summary
Zoom Execution:
Activation:
Release:
Turn around the
2 fingers
reference point
Display more than 2 fingers
Rotation Activation:
Execution:
Release:
1 finger
Turn around the
Display more than 2 fingers
reference point
Selection Select:
Unselect:
Notes:
Close the hand
Open the hand
The selection stays on until the hand opens.
Table 6: Iconic gestures 4 summary
5.5 Technologic gestures 1 5.5.1 Description The commands are controlled by sliding the hand vertically. It takes the vertical position of the higher finger and compares it with its last measurement. The vertical difference increase or decrease the value Final Project Paper
44
02.10.2012
for the designated task. The number of fingers defines the command. The selection is activated by a "click" of the thumb. By clicking, it means that the thumb must be visible, then disappear for a few milliseconds and then reappear [FIGURE 18].
Figure 18: Thumb click operations
5.5.2 Technical and Operational Advantages & disadvantages Sliding:
Technically it is simple to implement. All it needs is the current actual vertical value and compare it to its past value. A small margin or threshold reduces the sensitivity during operations. The problem with this technique is when the hand reaches the top of the camera's view. It can't go any further so it has to reposition itself down and restart the sliding movement again. The problem is that unlike on tactile screens, where the user can stop the recognition by releasing the touch screen, there's no stopping the recognition without being off the camera's view. So, the user has to release the activation posture, reposition correctly, reactivate the command with the activation posture and repeat all those actions until it reaches its goal. This is pretty demanding and exhausting.
Thumb click:
First it detects the thumb on the left hand with findThumb. Depending if the thumb is
present or not the state changes. There are four states to operate correctly [FIGURE 19]. The initial state avoids a false click at start. A timer starts every time the thumb disappears; if the thumb reappears within 800 milliseconds the Selection is activated. The delay is long enough to allow smoother thumb movement and to be less exhausting over time. There also is a minimum time of 250 milliseconds for thumb to reappear in case the thumb is lost unexpectedly for a few milliseconds. Even though it is simple to detect with a very good accuracy and it is easy to do for the users, the thumb click is not really appreciate by users. They find the gesture fatiguing and lame surprisingly.
Figure 19: Thumb click state diagram
Final Project Paper
45
02.10.2012
5.5.3 Quick Summary
Zoom Activation:
Execution:
Release:
2 fingers
vertical slide
anything but 2 fingers
Activation:
Execution:
Release:
3 fingers
vertical slide
anything but 3 fingers
Select:
Unselect:
Notes:
Thumb click
Thumb click
The selection stays on until a second thumb
Rotation
Selection
click is detected. Table 7: Technologic gestures 1 summary
5.6 Technologic gestures 2 5.6.1 Description This is a zoom improvement of technologic 1. The idea is to use depth for the zoom. There's no need of activation, the hand just has to reach a determinate area to do a zoom in or a zoom out. The "zoom in" is located in the front so the user has to stretch his arm. The "zoom out" is located on the opposite side in the back and the user has to retract his arm to reach it. In the areas, the zooms increase or decrease continually at the same speed. To stop zooming in or out, the user just has to leave the area.
Figure 20: Zoom technologic 2
5.6.2 Technical and Operational Advantages & disadvantages This method is easy to implement. The movement is continuous, so no need to reinitialize the command to finish it. Just allocate the boundaries to reach to execute the zooms. The closer they are, the less it fatigues the arm but the more it is activated accidentally. Then just compare the actual Z position of the hand with the boundaries. Like written before, this technique is really exhausting for the arm.
Final Project Paper
46
02.10.2012
5.6.3 Quick Summary
Zoom Activation:
Execution:
Release:
none
Reach front or back limits by sliding on Zaxis
none
Table 8: Technologic gestures 2 summary
5.7 Technologic gestures 3 5.7.1 Description In order to still improve the zoom previous techniques, a progressive zoom comes along based on depth. This time instead of having area to reach to activate the zoom in or out, the zoom actually increases or decreases progressively accordingly to the initial position of the hand at the activation [FIGURE 21]. This means the more the user stretches out is arm, the more it will zoom in and oppositely it will zoom out.
Figure 21: Progressive zoom from technologic 3
5.7.2 Technical and Operational Advantages & disadvantages Technically it is more complex to implement. First it needs an activation posture to set off the initial position. It just can't be hardcoded otherwise the zoom would increase or decrease without control directly at activation. It also just can't take the difference between the initial position and the actual position and zoom accordingly because the system would be too sensitive. So it needs steps to attenuate the hand imprecision and shakes [FIGURE 22]. The step size and number is configurable along with the zoom speed. Bigger steps give more stability in the movement, but the arm has to stretch out more and might end up restarting the full gesture over multiple times to reach goal. Another problem is if the user initiates the command near the limits of detection. It won't work as well as if it is made in the middle of safe detection area on the z-axis as it should. The steps are an array of Booleans; each time a step is crossed, it becomes true. If the system takes absolute values [FIGURE 23], it just needs to return the calculate value according to the zoom speed. If the system takes continuous values [FIGURE 23], which means if all conditions are met, it send over and over again the value to increase little by little the object, then it needs a second array of the past situation to compare it with the actual situation and only send once the value when something differs. There is no real difference in practice between the two techniques. Like its predecessors, this technique is physically exhausting and it demands a lot of concentration. Otherwise it feels natural and it is pretty accurate. Final Project Paper
47
02.10.2012
Figure 22: Progressive zoom with steps
Figure 23: Progressive zoom examples
5.7.3 Quick Summary
Zoom Activation:
Execution:
Release:
1 fingers
Slide on Z-axis
anything but 1 fingers
Table 9: Technologic gestures 3 summary
5.8 Technologic gestures 4 5.8.1 Description Keeping the idea of sliding, this iteration brings vertical and horizontal sliding in one single gesture from a reference point. It looks like a cross hence is name the "cross technique". The zoom and rotation are done by continuous movements. It is activate with one finger and it creates a reference position at its location. To zoom in the finger has to slide up and slide down to zoom out. The rotation works the same way but horizontally. Slide left to rotate clockwise and slide right for counterclockwise. A neutral zone is delimited by a red square around the reference point. The finger has to cross these limits to activate one of the commands. The commands speed is variable. The further the finger is from the limit, the faster the command will be executed. To release the command, just open the hand wide-open. For the selection: a pinching gesture. It's supposed to feel natural like picking up an object, move it and then drop it. The hand posture has to be at least a little bit on the side for the camera to detect correctly the pinching. If the hand is facing the camera directly, the fingers won't be correctly detected and will create instability.
Figure 24: Zoom, Rotation & Selection of Technologic 4
Final Project Paper
48
02.10.2012
5.8.2 Technical and Operational Advantages & disadvantages Zoom and rotation: The neutral zone gives the system some stability. It needs to be as small as possible to reduce the distance to execute a command, but it also needs to be big enough to avoid shakiness perturbations and mixing up the commands. Only one command can be executed at the same time. To change the command the finger has to get back in the neutral zone. Between vertical and horizontal commands, the chosen command is always the further distance (vertically or horizontally) from the reference point. In practice, the cross technique is easy to learn but hard to master due to its variable speed system. The variable speed can lost accuracy when it is not controlled properly. With skills, the variable speed gains time.
Selection: Pinching the fingers is a very easy and natural gesture to do. Implement it is different. First it has to detect the thumb with "findThumb". Pinching requires the two fingers to touch each other. Unfortunately with Candescent NUI, if two fingers touch each other they are consider as one finger. So to implement the pinching, it requires adding a threshold as small as possible. When the distance between the index finger and the thumb has crossed the threshold, the pinching is activated. It is deactivated when the distance is bigger than the threshold which allows the fingers to finish the full gesture and touch each other without altering the command. In practice, the pinching requires some skills. The movement has to be done gently, not too fast, to let the application measure the distance. The hand has to be a little on the side so the camera detect correctly the fingers. But once the learned, the gesture is effective and not too exhausting on the long term. 5.8.3 Quick Summary
Zoom Activation:
Execution:
Release:
1 fingers
vertical slide
More than 3 fingers
Activation:
Execution:
Release:
1 fingers
horizontal slide
More than 3 fingers
Select:
Unselect:
Notes:
Pinch fingers
release fingers
Rotation
Selection
Table 10: Technologic gestures 4 summary
Final Project Paper
49
02.10.2012
Final Project Paper
50
02.10.2012
Chapter 6 The selected gestures for the evaluation 6.1
The final choice............................................................................. 52
This chapter gives the reason why some gestures were picked for the final evaluation and others were dropped.
Final Project Paper
51
02.10.2012
6. The selected gestures for the evaluation 6.1 The final choice For the evaluation, all designed and implemented gestures couldn't be tested with users. It would have taken too much time. The evaluation needs to be short to avoid weariness and to have users willing to spare some time to do it. The evaluation tests two sets of gestures for the three commands. The first set is a selection of iconic gestures. They are supposed to be intuitive and natural. The second set tries to use simple gestures, close to each other to be as effective as it could.
6.2 The Iconic choice The selection was quickly chosen. The gesture of hand grabbing was obvious. It's simple and it worked pretty well during development compared to the other selections. For the zoom and rotation, continuous gestures have been favored over limited gestures. The gestures using depth for the zoom have been put aside due to the difficulty to be accurate and the fatigue it causes. The choice was to use the step by step rope technique. It's simple to understand and to operate. For the rotation, the rope technique has also been selected but in a first time using the absolute angle. In previous tests, this technique showed that it is hard for users to understand how it works and use it correctly without a lot of practice. So, the technique was dropped and replaced by the step by step rope technique which is handier. The two gestures activations are differentiated by the number of fingers; one finger for rotation and two fingers for the zoom.
Figure 25: Chosen gestures for Iconic
6.3 The technologic choice The technologic gestures try to offer very different command from the iconic gestures. Sliding gestures for zoom and rotation showed good results during development. Continuous gestures was also opted to reduce the users' movements and to avoid constraining repositioning. Using a vertical slide with a horizontal slide reduces the number of posture (1 finger or 2 fingers, etc) needed to activate the commands and so it reduces the complexity for the user. For the selection, a simple move in a predefined area will check if the users can easily put their hand in the right position to execute a command without any trouble.
Figure 26: Chosen gestures for Technologic
Final Project Paper
52
02.10.2012
Chapter 7 The Test application 7.1
The description of the application................................................. 54
7.2
The levels...................................................................................... 55
7.3
Editable levels............................................................................... 58
7.4
The general interface.................................................................... 59
7.5
The layers...................................................................................... 60
7.6
The objects.................................................................................... 61
7.7
The targets..................................................................................... 61
7.8
The feedbacks............................................................................... 62
7.9
Animations.................................................................................... 65
7.10 The measures................................................................................ 66 7.11 The logs........................................................................................ 66 7.12 Remaining bugs and issues........................................................... 67 7.13 The class diagram......................................................................... 68
The application is deeply described in this chapter. It brings all information on what the application does, what is possible to do with it and how it works. All features are treated from the interface to the log files.
Final Project Paper
53
02.10.2012
7. The test application 7.1 The description of the application The application focuses on the commands. There are three distinct commands to operate on objects; select, rotate and resize. A fourth command can also be consider; move. Moving an object requires the combination of the select command to activate the object in "move mode" and the pointer, controlled with the right hand, to move it anywhere like a drag and drop. The application tests these four commands independently in five levels (activities) of exercises. The last level is a combination of all commands. There also is a training level to help to user getting hands-on with the commands before the evaluation. The users go from one level to the next in the same order [FIGURE 28]. They have to do the circuit twice; once for each group of gestures. For each level, a log file is created at the end with the history of the commands, the time spent on the tasks and the distance between the object and its target after an operation. Each task is evaluated alone, to observe the learning curve. It also evaluates the whole level with general statistics of the commands and overall time spent. The levels uses xml files to create lists of objects for the tasks. These xml files also allow flexibility by adding and removing objects. Objects are editable; types, proprieties, options and targets proprieties. Before the evaluation starts, the application displays a window for the settings [FIGURE 27]. Mouse and keyboard are used for this window. It demands the name of the tester which must be unique. Verification is done to insure it when the button starts is pressed. Then the tester can watch three tutorial videos explaining all the commands, mechanics and important information needed during the evaluation. The tester can select which activities he wants to do but not the order. This is useful if the application was to crash or something happened. Of course to restart the evaluation with the same name, all previously done activities have to be unchecked. The user can choose between iconic and technologic gestures with a combo list. Two extra buttons allow setting the kinect vertical angle. The angle can be adjusted afterward during the evaluation. After clicking on the start button the evaluation begins and only kinect and hands-free gesture are used.
Figure 27: Setup windows form
Final Project Thesis
54
02.10.2012
Figure 28: Progression of the level path
7.2 The Levels The main idea for the evaluation is to test every command separated from one another. In each dedicated levels, users have to perform tasks. These tasks are designed to test specifically the associated command and no other. For each level, the same task is repeated several times to evaluate the learning curve. The number of tasks must be high enough to observe if a real evolution arise, but it shouldn't be too high to avoid weariness. Some little differences in tasks may occur like moving objects in different directions, rotate objects to different angles, etc. Small changes avoid routines and check if the command adapts itself well in other situations. So each task has different assignments. To provide results corresponding to the dedicated command only, the other commands have no influence on the objects and the tasks in general. They are accounted for the statistics but they don't jeopardize the test. A final level regroups all commands to evaluate a full situation. This time all commands are active and tested. For each level, a "skip" button appears during the activity if the user takes too much time to conclude it. Then the user has the choice to finish the level or skip it and move on to the next level. This avoids getting stuck indefinitely. All statistics are saved even if the level isn't totally done. Before a level begins, a description of the goal of the activity is presented. It comes with a noninteractive animated demonstration and the object color legends and a remainder of the commands.
Figure 29: Panel of the 6 activities
Final Project Thesis
55
02.10.2012
7.2.1 Training Training allows the user to test the different commands on multiple objects. There is no goal to achieve in this level, just get use to the commands and pointer. The training level has a minimum duration of ninety seconds. Of course the user can spend more time exercising if he wants to. At ninety seconds, a button in the right corner appears. The user just has to point at it to exit the level and go to the next. 7.2.2 Selection The selection level provides twelve tasks. Each task consists of getting the pointer over the object, select it and unselect it still pointing at the object. The twelve objects appears one after the other. They are spread equally over a circle [FIGURE 30]. Every object is at equal distance, 400 pixels, from the next, they also appear in an order providing the same distance every time. This technique comes from the ISO 9241-9 document, chapter Multi-directional pointing task. In the application code, it is easy to change the number objects for the activity. By default, there 12 objects, but it can be modifying without having the trouble of computing the object positions and order. Everything is done automatically, as long as the number of object is even. The "createPiecesInCircle" method needs the center point of the circle, the number of piece and the radius of the circle. It will return a list of objects in the right order and in the right place. The order of appearance is important. It gives the opportunity to test the movements in every direction. The object can only be selected, no rotation, no drag and drop, no resizing are possible. For each task, the log file collects the number of selections and the distance between the pointer and the object to evaluate the accuracy. It also collects the time to achieve each task and the number of possible rotation and zoom activations which have no effect but are still counted for statistics.
Figure 30: Appearance order on circle
7.2.3 Move The move level tests the drag and drop [FIGURE 31]. It demands the users in each task, 12 in total, to select an object, hold it with the left hand, move it in the blue area with the right hand and release it. Like the selection level, the objects are spread around a circle for the same previous reasons and with the same options to change the parameters. The target area is located accordingly to the object on the opposite side of the circle. The distance to cover is always the same. The log file recorded the distance
Final Project Thesis
56
02.10.2012
between the object and the target zone at every release of the object. The object can only be selected and moved in this level.
Figure 31: move level demonstration
7.2.4 Rotation This level focuses on rotations only [FIGURE 32]. The user still has to point the object to activate it and do the rotations. Each rotation is different and the user has the right to do them clockwise or counter-clockwise which can take more time depending on the choice. There are eight tasks in this level. Objects can only rotate if other commands are deactivated. The log file collects the number of rotations for each tasks and the angle difference between the object and the target.
Figure 32: rotation level demonstration
7.2.5 Resizing The resizing level evaluates the zooms [FIGURE 33]. It works exactly like the rotation level. It counts all the commands but only the zooms have effect on the objects. There are eight tasks to accomplish. The tasks go from little zooms in and out to test accuracy and large one for speed. After each release of a zoom command, the difference between the actual size of the object and its target is recorded.
Figure 33: Resizing level demonstration
7.2.6 Final The last level regroups all the commands. In each task requires moving, rotating and resizing one object [FIGURE 34]. As the level is demanding more efforts and takes more time to execute, three tasks need to be accomplish to finish the activity. The three commands can be done in any order. For each task, all behaviors are recorded; time, selection distance to the target, rotations and zooms as well as the global statistics for the whole activity.
Final Project Thesis
57
02.10.2012
Figure 34: Final level demonstration
7.3 Editable levels To give more flexibility to the application, a system of editable tasks has been implemented. It allows adding, removing and changing tasks before running the application. Instead of hard-coding the tasks in the application itself, the idea of reading them from xml files made more sense. Every level has its own xml file containing all the objects, which can become tasks depending on the level nature. Each object or piece as it is in the xml file has several properties. It has a type, an id, a goal or not, initial properties like its position, size, orientation, and scale. If it has a goal, it needs information about the target (final position) which is also the locations, size and orientation. Finally, the piece has a few options. It can be chosen if the piece is active or not, which means that a non active piece won't be displayed until it supposed to. It is useful to go from one task to another. The other options focus on permissions like if it can be moved, selected, rotated, resized or even highlighted. Each file can have as many pieces as wanted. The structure of the file isn't very flexible; all the tags need to be correctly written.
For levels designed in circle like selection and move, only one piece needs to be described. Indeed, in those levels each piece is the same. They are all spread around the circle. So, the application takes the properties of the first piece and duplicates as many times as it needs. It automatically changes the id of each piece so they appear in the right order during execution. The id number doesn't follow the same order as the creation of the pieces. The pieces are created in order, one after the other following the circle. The ids follow a different order to move the pointer in all direction as explained before. From 0° to 180° the ids of the pieces take odd numbers (if started from 1) and even numbers from 181° to 360°. It also deactivates all the pieces except the first one.
Final Project Thesis
58
02.10.2012
Figure 35: xml piece example
7.4 The general interface The Candescent library gives all necessary data source from Kinect needed to do the recognition. The data needs to be centralized. That's the job of the HandTracker class. It updates the data when new one is available. Then it updates its attributes like the pointer and the commands with the recognizer. All the attributes can then be recovered with get methods from any class with a HandTracker object. This way hand tracking can be use in any situation and can adapt to other application than just layers from the candescent library. The recognizer is an interface. It allows choosing which class of gesture the application wants to use, the CrossGestures for the technologic gestures or the RopeGestures for the iconic ones. It can easily adopt new gestures class as long as it fulfills the interface contract. This application uses the graphic layers from the candescent library. 7.4.1 The pointer The pointer is controlled by the right hand. The HandTracker class returns only a point in two dimensions but more is done in the pointing class. First it takes new values of the pointer every 20 milliseconds to have a smooth movement. The actual point taken into account is the position of the first finger which is the highest one on screen. First it checks the finger is actually above the hand's palm in case if false fingers are detected. Then it checks the hand position as it mustn't be too much on the left in the left hand area. If the actual point fails in one of these conditions the last pointer position is taken instead. Otherwise, the point is processed. The new point isn't directly stored. It goes into a smoother to increase the steadiness of the pointer. First the application used the Kalman smoothing Final Project Thesis
59
02.10.2012
process. The results were not satisfying. The pointer movements were smooth but the delay between the real movement from the user and the actual movement of the pointer was enormous. The smoother was dropped. Instead the application uses a simple homemade smoother. It takes the six last positions of the pointer and computes the average. It isn't perfect but it smoothes the pointer enough to be accurate and it is instantaneous. If any problem occurs, the pointer takes always its last position and if the latter doesn't exist the position (-1;-1) is taken. This position is outside of the screen so it can't be wrong. Finally the current position is stored waiting for the HandTracker to get it. 7.4.2 The commands The commands can be managed differently depending on the nature of the gestures. But in any case the commands needs to fulfill its contract and returns its values like if the selection is active or the rotation with its angle value or the zoom also with its value and the list of points. The list of points is like the pointer. It gathers the left hand's highest finger position. It returns the last 25 positions which allow displaying on screen the green trail to see more easily the flow of movement.
7.5 The layers The layers are the bases for the levels. Each layer regroups its level structure, rules and operations. It loads and creates its objects. It recovers the gestures recognition. It manages the objects' properties. It paints the objects and the feedbacks. It measures the user's actions and writes them in log files at the end. Most of the procedures are generic to all layers, this why a main layer generalizes the other layers [FIGURE 36].
Figure 36: Layers classes structure
The layers have an identical structure for the levels. First it has a presentation with a title, a descriptive text of the activity, an animated demonstration, legends and a button to start the test. Then between the presentation and the test a countdown starts for 5 seconds. Then the activity begins. Finally when the activity is done an end screen congratulating the user with a "well done" is displayed [FIGURE 37].
Presentation
Countdown
Test
End screen
Figure 37: layers' levels structure
Final Project Thesis
60
02.10.2012
7.6 The objects The objects in the application are called pieces. Each piece is a composition of simple base geometric forms [FIGURE 38]. There is about half a dozen different types of pieces which is plenty enough for the application. All types of pieces are heritance of the root class "Piece". The latter contains all the necessary attributes and methods to select, move, rotate, zoom and display the whole composition. The pieces have status. They can be active (gray), hover (blue) with the cursor over it for rotations and zooms and finally selected (orange) to be moved. Let's talk about the basic pieces. For instance the class "CenterRectangle" creates a rectangle with its main location in the middle instead of the corner left. This rectangle has four corners. These corners will define if the cursor is hover the rectangle or not with the dot product. It comes a little more difficult when the rectangle changes its size or rotations around its center. All the corners have to be recalculated with trigonometric formulas. It comes really tricky when the whole piece is a composition of base pieces. Then the center of rotation is only valid for the central main piece. The other remote attached pieces have the same center of rotation which is not their own previous one. It needs to compute the new locations of the remote pieces' centers for every resizing and rotation, [FIGURE 38]. Otherwise the whole piece would be deformed. The pieces also inform each other if the cursor is over them so it highlights the whole piece instead of just a part of it.
Figure 38: Demonstration zoom and rotation on a composite object
7.7 The targets In several levels of the applications, the objects have objectives to reach. These objectives need to be clear and shown to the users. For tasks requiring shape, orientation or location changes, the objects get along with target objects. Targets and objects have the same properties except targets can't be altered. They just bring a visual support to the user. They are represented in the application in red wired style to be clearly identified. The objects and targets are separated in two lists of "Piece" objects.
Final Project Thesis
61
02.10.2012
Figure 39: Target example
7.8 The feedbacks The feedbacks are a major part of an application. The interface must bring to the user information of his actions. The feedbacks have to be as clear and simple as possible, so the user understands directly the impact of his actions without losing focus on what he's doing. The feedbacks mustn't take too much attention from the user by overcrowding the space on the screen with too much information.
To help the user finding his mark with camera, at least at the beginning, a visualization of the hands is really important. The candescent library provides contour visuals methods of what it recognizes. The application uses this feature to display the hands on screen. First the two hands were displayed. But after preliminary tests it results that the right hand visual feedback was more confusing the users than helping them. It comes from the fact that the right hand controls the pointer which is represented by a red aim. The right hand visual isn't attached to the pointer and is more static in comparison has it stays in the right corner of the screen. In practice, it turns out users were looking more at the hand feedback instead of the aim which led to pointing mistakes. So it was decided to remove the right hand feedback and keep the red aim only which results more positively than with the two feedbacks. Only the left hand contour feedback is displayed. Unfortunately, it isn't enough for the user to know exactly which part of the hand is taken into account. So the application adds a green dot showing the real point of capture, the highest finger. As the hand is moving, the green dot follows it and leaves a green trail of smaller green dots [FIGURE 40] providing a feeling of movement which can be useful when making circles.
Figure 40: Left hand feedbacks
Final Project Thesis
62
02.10.2012
It is hard for the user to know if he is too close or too far of the camera. A feedback is needed to clarify the situation [FIGURE 41]. The feedback is divided in two parts. First the hand contour color changes if the hand gets too close to a limit of detection. As there are two extremities, two colors are used; orange if the hand is too close of the camera and red if it is too far. Using two different colors allows the user to more quickly recognize the problem and fix it. Of course the user needs to know the meaning of the colors. This is where part 2 takes place. With the hand color changing an additional feedback appears. The feedback is an arrow showing the direction the hand should go on the z-axis. The arrow points toward the user if he must move back his hand or it points toward the screen if he must get closer. In addition a text is displayed above the arrow to clarify the situation. So at first the user will check at the arrow and text to learn the color meaning but then he'll know exactly which color stands for and won't have to look at the arrow and be more effective. If the user hand gets beyond the detection areas, a "no hand" message blinks for the missing hand. Blinking messages drives the user's attention more quickly.
Figure 41: detection limits feedbacks
The screen itself needs feedback to guide the user [FIGURE 42]. To avoid the user to move his hands everywhere a detection area frame is always on screen. This frame avoids having the two hands near each other which would results in bad detections. It also shows where the best zone for detection is.
Figure 42: Feedback of the frame for the detection area
Some text describes the frame purpose. Two little red areas in the top and bottom left corner move the camera if the left hand points at them. Every time the user succeeds in a task a good message appears for a few milliseconds in the top middle screen. Also to letting know where the user is in an activity, information is displayed on the top right of the screen [FIGURE 43]. The information gives the name of the activity, the number of the task and how many tasks there is. The application has buttons to validate actions from the user like starting the new activity, skipping one if it takes too long to finish.
Final Project Thesis
63
02.10.2012
Those buttons are activated with pointer, not the mouse. A feedback of the button pressed is turned on when the pointer is over it.
Figure 43: Text & button feedback
The feedbacks for the technologic gestures need to be simple to understand them immediately. The gestures use horizontal and vertical movements. The feedback is a cross made out of two arrows going both ways [FIGURE 44]. Some text describes its proper action at each extremity to remind the user. There's a red square showing the neutral zone in which the hand can move without making any change on objects. For the selection a simple blue line showing the area of selection is sufficient. When an action is ongoing, a corresponding text appears at the bottom left of the detection frame.
Figure 44: Technologic feedbacks
The feedbacks for the iconic gestures are simple. The rotation and the zoom use the same technique. The feedbacks are the same, but the user needs to differentiate them. So the application uses different colors; purple for the rotation and orange for the zoom. The line between the center point and the finger is getting larger with the distance. It makes it more perceptible for the user. The zoom has a little "+" or "-" near the center point to show if it zooms in or out. The selection needs no feedbacks as closing the hand is obvious and the contour of the hand makes it clear. Nevertheless, like the technologic gestures, a text of the command is displayed at the bottom left of the frame.
Figure 45: Iconic feedbacks
The objects use different colors to make a distinction between each status [FIGURE 46]. They are four status; normal, active, selected and success. There is a fifth color for the target objects. Each color is dedicated to a status. Gray for inactive objects waiting to be awakens. Blue when the pointer is over the object making it ready to be rotated, resized or selected. Orange when the object is selected and
Final Project Thesis
64
02.10.2012
ready to move. Objects turn green when they have reached their targets. The targets are white with red borders. To simplify the object placements over the targets, the middle of each object has been marked with a dot [FIGURE 47]. When objects have different orientation it might be difficult to know where the object must go. With the middle dot no problem anymore, the user know exactly where it should go.
Figure 46: Object's status color feedback
Figure 47: Object's middle dot feedback
7.9 The animations Each activity is different. The user needs to know what will be his next task. Before the activity starts a little descriptive page tell the user what is next. The description is a brief text. The description can be as clear has it could a demonstration is more effective. So the application has a system to do some animations. Animations can't be interacted with. They can have objects and pointers, text or any graphic. The animation speed is a new frame every 40 milliseconds but it is adjustable. Objects' status can change by selecting the options with the "setAllOptions" method. There's three possible ways to do an animation. The first one is to use "Demostate" as a time reference. Each 40 milliseconds (as default) Demostate is incremented. Move, rotate, resize or change the objects status incrementally for definite time (demostate values). Second technique is to do an action until it reaches its goal. Like for examples rotate an object of 45° with steps of 3° each iteration or moving an object to the right. It's quicker to implement but less flexible. Last possible technique, is the whole manual method. This time each object follows an array of animation. At each demostate increments, the object checks in the next tile of the animation array to get its new attributes. The position, the size, the angle and the status of the object must be indicated each time. It can take more time to do, but there's more flexibility and precision in the animation.
Final Project Thesis
65
02.10.2012
7.10 The measures The evaluation application needs measurements; otherwise it wouldn't be an evaluation application. The data are recorded all along the evaluation. Each level has its own data recorded, hence it has its own data measurement system also to be fit its needs. The log file of an activity is written at the end of it. So, the data can't be saved in the file directly. To keep track on each measurement of each task, a list of measurement is created. This list is composed of structures themselves composed of attributes. Each task has its attributes stored in one structure added to the list. At the end the list is read to write the log file.
Depending on the nature of the task, the type of gestures and the measurement needed, the monitoring can differ. For example; the selection level needs to know the distance between the pointer and the object. For this the measure must be taken on the activation of the selection command and not the release. On the other hand, the move level wants to know the distance between the object and its target after displacements. So the measure has to be taken on the release of the selection command to know the final position. The iconic gestures have distinct commands that allow the measures to be taken at the release of the gestures as they can't be confounded. The technologic gestures are based on a cross system that provides the rotation and the zoom within the same activation. Here to count the rotations and the zooms the neutral zone is used. Every time the left hand goes back in the neutral zone, it measures the current command progression. The hitch is if for example a rotation goes too far and the user wants to go back to rectify, he has to go to the other side of the cross by going through the neutral zone and it will be counted as a finished movement. This doesn't happen with iconic gestures. A solution would be to count one movement maximum per command if activated and check their final values at release.
7.11 The logs At the end of each activity level, a log file is created. This log file gathers all data acquired from the measurements. To clarify files contains general information indentifying its nature. First it has the application name on top. Then it has the user name, the date the file was created and the type of gestures that was used during the test. The second part of the file reports the user's action for each individual task specifically according to the activity. That means for the rotation activity for example, only rotation are taken into account and reported into the log file. Other actions don't need to be recorded has they are deactivated. Of course, the log file of the final level regrouping all the commands contains every measurement. Nevertheless, at the end of the file, after all tasks have been written with their times, number of tries and their specific measurements, general information about the behaviors during the whole activity are written. It gives information like the average time for the
Final Project Thesis
66
02.10.2012
task, the total time of the activity, the number of time each commands have been activated and the positions of the pointer during the test.
The files are located in the log directory of the application. They are separated in sub-directories, by modality and then by activity. This ways it is quicker to regroup user evaluations by activity. In the latest version of the application the filename is the user name + the name of the activity so it is easier to regroup files of the same user.
7.12 About the applications The system works well. It doesn't seem to crash for no reason. A lot of work was done to have it run smoothly. Special care in the application has been done to give direct flexibility with editable levels, camera adjustment and the choice of levels and gestures mode. Indirectly in the source code, it is easy to add new kind of objects and new kind of gestures, change the settings of the rotation and move levels or even activate the lock mode.
The lock mode is a system that once activated locks an object in active mode as long as the user doesn't release its current commands. Without the lock mode, the pointer always has to be on the object so the commands take effects. This can help is few occasion like the rotation where the pointer can easily go out of the object and stop the rotation. It let the user focus on the commands and less on the pointer. It has its flaws too. First the release of a command is more important to unlock the object which can and had cause troubles as commands can still modify the object. Without the lock system, the user can just point out of the object which will make it inactive and no errors can be made. It also avoids trying to modify an object with no success when another is already activated.
Objects can overlap each other without any problem. This feature has to be implemented. Otherwise without prior treatment of object's status and checks the first created object would always be selected when objects are overlapping.
The application has been tested on three different machines with the same results. Nevertheless, it remains a major issue. It doesn't come from the application directly but from the Candescent library. It seems that the algorithm to detect the palm center of the hand gets null exceptions in certain circumstances. Those circumstances couldn't be specifically clarified. It looks like it has more risk to crash when hands are close to each other and when an outside objects like heads, breast, belly, someone else or armrest of chairs are detect by the Kinect sensor. This bug might be fixed in the future in new versions of the library. Otherwise the system is pretty robust. Final Project Thesis
67
02.10.2012
The tutorial videos were made with the freeware Cam Studio 2.6 and the editing was done with Windows Movie Maker 2.6.
7.13 The Class diagram The class diagram shows the structure of the system. It gives an overview of the connection between the classes. This diagram would be very helpful for someone who wants to get into the application and understand its functioning to change it rapidly.
Figure 48: Application class diagram
Final Project Thesis
68
02.10.2012
Chapter 8 Evaluation 8.1
Condition of the evaluation........................................................... 70
8.2
Pre-evaluation................................................................................71
8.3
Range of testers............................................................................. 72
8.4
The questionnaires.........................................................................72
8.5
Results........................................................................................... 73
8.6
Analysis......................................................................................... 89
Here is the evaluation chapter. First it describes the conditions in which the evaluation took place and the kind of testers was evaluated. The important results of the evaluation are described with graphs and commentaries. Finally, the chapter ends with an analysis of the results to point out important observation.
Final Project Thesis
69
02.10.2012
8. The Evaluation 8.1 Conditions of the evaluation The goal of the evaluation is to see if a system of gesture recognition is better over the other one. The evaluation has been done providing the same conditions to every tester. The same chair with armrests at the same position was used. The lighting in the room was the same to provide an equal quality of recognition due to the camera requirements. A dark room or an over lighted one decreases the performances. The first part of the evaluation is to show and teach the users all the commands, the interface of the application and some tips to perform better. This is done with three videos. The first one shows the environment and the bases to not feel lost. The second and third videos, one for the technologic gestures and one for the iconic ones, demonstrate the commands. Each video is a demonstration of all the commands available and how to perform them. The videos are watch just before the evaluation is taken. The second phase of the evaluation is the training. This part helps the user gets to put in practice everything the videos showed. The user use this part to get his marks with the recognition has a whole, the body placement, the arms, the hands and the head, which can be new and disturbing for some people. The camera's vertical angle is adjusted to the user preferences to give him the most adapted comfort. Once set, the user can get used to the pointing and commands. He can train as long as he wants and do as much manipulations as he thinks he needs to be comfortable with the environment. The training phase has a minimum time of 90 seconds. This minimum time, is set to encourage the users to try the commands before the real tests. When the user is ready he just points at the "next" button to begin the evaluation.
Each tester has to go through the tests twice, once for each type of commands. The Activities are always done in the same order. First each command is tested separately from the others and in the final level all commands are tested altogether. There are five activities; Selections, Moves, Rotations, Resizing and Final (a compilation of the four others). Each activity is composed of repetitive tasks. One evaluation has total of 43 tasks among the 5 evaluated activities.
Each test has a presentation page. These pages explain everything there is to know to accomplish the tests. They have a text description and an animated demonstration of the tasks. They also remind the users, if necessary, all the commands and the color significations of the objects. Indirectly, those pages provide a pause between each test, so the users can rest their arms. To avoid a brutal start of an activity a countdown of 5 seconds is set between the activity and the presentation page. Within subjects is used for the evaluation [TABLE 11]. All subjects are assigned to all cases. To avoid bias and the learning effect, as each test is performed twice, the testers are divided in two groups. The
Final Project Thesis
70
02.10.2012
first group does the technologic gestures to start and finish with the iconic gestures. The second group does the opposite, it starts with the iconic gestures and it finishes with the technologic gestures.
Group 1 Group 2
# of users 5 5
First experiment Technologic gestures Iconic gestures
Second experiment Iconic gestures Technologic gestures
Table 11: within subjects experiment progress
The evaluations were done at the same place in the same conditions. The user does his tasks alone. An observer watches the operations from a distance. The observer can't be too close of the tester otherwise the camera sees the observer and recognition fails. The observer only interferes if the tester is lost or if there is technical problems and he needs help to go on by giving him small oral tips.
At the end of an evaluation of a type of gesture the user must fill a questionnaire about is perception of the tests, the commands and his physiological fatigue. As the questionnaire was written in English and some of the testers don't speak the language, the observer translates the questions for them and fills the papers up.
The whole evaluation takes around 35-40 minutes to complete. Around 18 minutes of actual activities to which it must be added 5 minutes of tutorial videos, a minimum training of 3 minutes (twice 90 seconds), time for the presentations of activities and the fulfillment of the questionnaires.
8.2 Pre-evaluation During the development, pre-evaluations were carried out to see from an external point of view what was done right, what was missing and more importantly what was done wrong. The evaluators were given the task of going through the whole applications and give their impressions. First thing to come out was the tasks were too hard with the commands. That made the evaluation too long and exhausting to go through. One problem was the pinching for the selection. It was a bit too sensitive to be executed properly and easily. The other command giving troubles to testers was the rotation's rope technique absolute angle. It was hard for the tester to understand how the command operates. These two commands were changed. Then each evaluation went from 49 tasks to 43 tasks by removing 2 rotation tasks, 2 resizing tasks and specially by removing 2 final tasks which were really hard to execute and took a lot of time. By doing so, it reduced the overall time by around 30%. The complexity of the rotation with the rope technique was reduced. Instead of having the direct absolute value of the angle, the value is always the closest value to a multiple of ten. Another issue was the confusion with the pointer (red aim) and the right hand representation feedback. Then the detection area frame, the camera adjustment, the object center point, etc came from observation and feedbacks of these preevaluation tests. Final Project Thesis
71
02.10.2012
8.3 Range of testers Some pre-test were done with people that were technophobic which ends up with bad results. So, it was decided to use testers used to today's motion recognition technologies or at least not afraid of using them. All testers have at least used once a technology like Kinect, Wii motion controller or some similar devices. They were all aged between 24 and 35 years old and were all male. It is not that women were excluded from the evaluation. Women were tested during pre-evaluation. It just happened none were found for the evaluation. The testers came from very different kind of fields. Three were ITs, other were working in other domains not related to technologies.
8.4 The questionnaire At the end at of each evaluation the testers had to fill a little questionnaire. The questions have been inspired by the table C.1 Independent rating scale from the ISO 9241-9 document. Some questions were kept unchanged or modified, others were removed and new ones were added. The look and structure was also inspired by the C.1 table but also by the C.2 table. Finally a third original and final design was chosen. The questionnaire is qualitative. Each question is rated from 1 to 7. The user expresses his sensation and perception of the system by giving grades. The questionnaire is divided in two parts. First part is focusing on the commands. The questions try to cover all aspects from the general comfort, the precision, the ease of the use to the feedback quality. The second part is all about the fatigue. It tries to figure out which part of the body is exhausted and the overall fatigue.
Figure 49: Questionnaires for evaluation
Final Project Thesis
72
02.10.2012
8.5 Results The data are now recorded in the log files. They need to be extracted and rearrange into tables to be transformed into information. To analyze the data, Excel and R are used. Excel to have a clearer view of the data in tables and to have quickly nice simple graphics and results. R is used create more sophisticate graphs harder or impossible to do with Excel like the box plots, density curves and distributions.
Most of the information below is based on time to try to figure out if a technique is better than the other. The structure of the results is almost the same for each activity. First, it compares the average time of the tasks and the activity to see which performed best. Then to check the condition for the statistical test, it needs some numerical results like the standard deviations, variances, quartiles, etc. Along come some graphs like the data repartition, the densities of the data and the distribution curves to have a better view of the situations.
Statistical tests, like the student test, are done. The two-sample t-test is one of the most commonly used hypothesis tests. It is applied to compare whether the average difference of means between two groups is really significant or if it is due instead to random chance. It states the null hypothesis that the means of two normally distributed populations are equivalent. The t-tests have been conducted with R and Excel. An internet site6 also gives the opportunity to do the test simply and provides complete results with analysis. When comparing the three methods, little differences can be noticed, not on the t-test but on the calculation of the quartiles. These little differences don't make a difference on the final decision whether or not reject the null hypothesis. Paired t-tests were used as it is one group of units that has been tested twice. The internet provides useful information like this little questionnaire 7 (Appendix Correction Welch). It guides people to choose the right statistical test according to their data. 8.5.1 Selection Times overlook by tasks Let's take a look at the learning curve for the selection. On the next four graphics [GRAPH 1 & 2], we can see a clear difference of time between the 3 to 4 first tasks and the next one. With the two type of gestures, it takes around three times adopt the command. Then the curve stabilizes with less variation. The iconic selection is better for more than 80% of the time. We can see a pick for the iconic curve for the fourth tasks. It comes from two testers who took around 11 seconds instead of the usual average of 6
http://www.physics.csbsju.edu/stats/
7
http://biol09.biol.umontreal.ca/BIO2041e/Correction_Welch.pdf
Final Project Thesis
73
02.10.2012
3 to 5 seconds for this particular task which brings the mean way up. Otherwise the iconic smooth curve shows predisposition to have better performances than the technologic one.
Graph 1: Average time for each task from Technologic and Iconic side by side
Graph 2: Selection: average time's comparison by task
Times overlook by testers Let's take the average time of the selection activity of each the testers and put them against each other [GRAPH 3]. We can notice that are close in most cases between technologic and iconic. It is interesting to remark the crossing of lines between tester 5 and 6.As the tester 1-5 starts with technologic gestures and testers 6-10 begin with the iconic ones, it looks like the second walkthrough for the selection activity makes better results, except for tester 10.
Graph 3: Selection: average time's comparison by tester
Final Project Thesis
74
02.10.2012
Here's a quick summary of the data. The average time for the iconic selection is around 400 milliseconds better than its counterpart. The standard deviation is also smaller with iconic which can be consider as a more concentrated distribution around the mean. It tells us that 65% of the testers succeed their tasks between 3.35 seconds and 5.18 seconds against 3.3 and 5.94 seconds for technologic.
Technologic Data Set Summary: 10 data points
Mean = 4.621
Low = 2.95
High = 6.99
First Quartile = 3.99
Median = 4.425
Third Quartile = 4.69
Standard Deviation = 1.32
Variance = 1.742
95% confidence interval for actual Mean: 3.68 thru 5.562
Iconic Data Set Summary: 10 data points
Mean = 4.260
Low = 3.32
High = 6.52
First Quartile = 3.64
Median = 4.184
Third Quartile = 4.462
Standard Deviation = 0.915 Variance = 0.837 95% confidence interval for actual Mean: 3.606 thru 4.914
Histogram: The histogram [GRAPH 4] shows us the repartition of the data. The two histograms overlap each other to notice differences. The data are close to each other. The technologic repartition is more spread which results in a flatter distribution curve. The iconic distribution curve is narrower as the data are more concentrated.
Graph 4: Histogram + Distribution curves + densities
Final Project Thesis
75
02.10.2012
Means comparison: This box plot graph [GRAPH 5] shows us quickly that the data are really close to each other. Even though, there is a small advantage for iconic, as the mean is lower and the data are more concentrated around it with less far outliners than technologic.
Graph 5: Box plot Selection Technologic and Iconic
Statistical T-Test: The t-test uses paired data as parameters. It resulted with a p-value of 0.48. The null hypothesis is rejected. There are not enough statistical evidences to conclude the data comes from the same population and that there is a significant difference between the two selection gestures [TABLE 12].
Table 12: Selection: times t-test table
Tries and errors By looking at the graphic [GRAPH 6] and table [TABLE 13] the iconic number of try required to achieve a task is inferior to the technologic one but the difference isn't significant enough to make a difference with the t-test.
Graph 6: Selection: number of tries by testers
Table 13: Selection: tries t-test table
Final Project Thesis
76
02.10.2012
Effective index of difficulty The Fitts's law is used to measure the difficulty and the performance of the users during the activity. "This is the measure, in bits, of the user precision achieved in accomplishing a task."8 The Shannon's formula is used for the effective index of difficulty (IDe) [FIGURE 50]. of the target, in this case 400 pixels.
is the distance to the center
is the effective size of the target. The effective size is
calculated according to the standard deviation of the selections distances to the center of the target multiplied by a constant of 4.133. This adjusts the target width assuming 96% of the users hit the target within the range. The results return average throughputs of 0.87 bps for technologic and 0.93 bps for iconic. The difference is small and insignificant. These are low results considering the indexes of difficulty [TABLE 14] were inferior to 4 which keep the tasks precision category to low (between 1 and 4). One reason is the time it takes to activate the selection commands and more important the time to release them.
Figure 50: Index of performance formulas
Table 14: indexes of difficulty with average throughputs
8.5.2 Move Times overlook by tasks Like the selection activity, moving objects is quick to learn. We can see on the graphics [GRAPH 7 & 8] the average time per task drops rapidly after the first two one. Then the times are pretty similar with an average time of 5.47 seconds for technologic and 4.85 for iconic. To check if the difference of time is significant, a T-test has been done. The outcome gave a p-value of 0.059 which is very close to the 5% limits. A little too much to accept the null hypothesis, but close enough to not denied the performance with iconic gesture is better.
8
ISO 9241-9, B.5.1.3 Effective index of difficulty
Final Project Thesis
77
02.10.2012
Graph 7: Average time for each task from Technologic and Iconic side by side
Graph 8: Move average time's comparison by task
Times overlook by testers If we look at the data by testers, the graphics [GRAPH 9] show us that 70% of the testers performed better with iconic. The average time also gives a clear advantage of using iconic over technologic with mean of 4.87 seconds against 5.48 seconds respectively.
Graph 9: Move average time's comparison by tester
Technologic Data Set Summary: 10 data points
Mean = 5.475
Low = 3.83
High = 8.91
First Quartile = 4.72
Median = 4.961
Third Quartile = 5.9
Standard Deviation = 1.44
Variance = 2.073
95% confidence interval for actual Mean: 4.447 thru 6.502
Final Project Thesis
78
02.10.2012
Iconic Data Set Summary: 10 data points
Mean = 4.870
Low = 3.92
High = 6.97
First Quartile = 4.07
Median = 4.852
Third Quartile = 5.16
Standard Deviation = 0.89
Variance = 0.785
95% confidence interval for actual Mean: 4.236 thru 5.503
Histogram: The data repartitions [GRAPH 10] overlap well over each other. Even the densities look alike. However the technologic distribution is flatter and lower than the iconic ones which reveal a large difference of the standard deviations.
Graph 10: Histogram + Distribution curves + densities
Means comparison: The box plot [GRAPH 11] shows us the medians are practically equal, but the iconic data are more concentrate then the technologic ones. One outliner was detected for each type.
Graph 11: Box plot Move Technologic and Iconic
Statistical Test: The data is consistent with a log normal distribution: P= 0.37 where the log normal distribution has geometric mean= 5.491 and multiplicative standard deviation = 1.342. The iconic data is consistent
Final Project Thesis
79
02.10.2012
with a log normal distribution: P= 0.61 where the log normal distribution has geometric mean= 4.892 and multiplicative standard deviation = 1.231. The T-test returns a p-value of 0.27 [TABLE 15] which too high to accept the null hypothesis. There is not a significant difference between the two means.
Table 15: Move: times t-test table
Tries and errors By looking at the graphic [GRAPH 12] and the table [TABLE 16], the iconic number of try required to achieve a task is inferior to the technologic one but the difference isn't significant enough to make a difference with the t-test.
Graph 12: Move: number of tries comparison
Table 16: Move: tries t-test table
8.5.3 Rotation Times overlook by tasks Iconic rotation has clear advantage with the first tasks but the difference is reduced over time to get practically equal [GRAPH 13].
Graph 13: Rotation: average time's comparison by task
Times overlook by testers The iconic rotation is a clear winner when it comes to compare times by testers with 90% of better performances [GRAPH 14]. The iconic average time also is way under the technologic one with a Final Project Thesis
80
02.10.2012
difference of 3.5 seconds. 75% of the testers finished their tasks under 17.5 seconds against 20 seconds with technologic.
Graph 14: Rotation average time's comparison by tester
Technologic Data Set Summary: 10 data points
Mean = 17.57
Low = 9.14
High = 28.7
First Quartile = 14.8
Median = 16.18
Third Quartile = 20.2
Standard Deviation = 5.69
Variance = 32.38
95% confidence interval for actual Mean: 13.50 thru 21.64
Iconic Data Set Summary: 10 data points
Mean = 14.07
Low = 9.18
High = 22.1
First Quartile = 11.2
Median = 14.07
Third Quartile = 17.5
Standard Deviation = 4.13
Variance = 17.06
95% confidence interval for actual Mean: 11.54 thru 17.44
Histogram: The technologic results show on the histogram [GRAPH 15] the data are more spread than the iconic ones. The curve distributions verify the data are more concentrated for the iconic ones with a smaller standard deviation.
Graph 15: Histogram + Distribution curves + densities
Final Project Thesis
81
02.10.2012
Means comparison: We can see a clear difference between technologic and iconic with the box plot graphic [GRAPH 16]. Even with two outliners the technologic rotation can't compete.
Graph 16: Box plot Rotation Technologic and Iconic
Statistical Test: The technologic data is consistent with a normal distribution: P= 0.33 where the normal distribution has mean = 18.12 and standard deviation= 7.095 The iconic data is consistent with a normal distribution: P= 0.52 where the normal distribution has mean= 14.80 and standard deviation = 4.878. The t-test returns a p-value of 0.005 which is very confident to accept the null hypothesis. There is a significant difference between the two means. The iconic rotation performs statistically better [TABLE 17].
Table 17: Rotation: times t-test table
Tries and errors By looking at the average number tries by tasks, we can see the learning curve [GRAPH 17]. The techno curve shows the reduction of errors over time. The testers seem to gain confidence. The iconic curve is more stable as the command counts less activations with the rope technique than with the cross system. We can't assume much from this curve.
Graph 17: Rotation: number of tries comparison
Final Project Thesis
82
02.10.2012
8.5.4 Resizing Times overlook by tasks What is clear on these graphics [GRAPH 18] is that once again iconic performed best. It seems that the difficulty of the tasks was equal for the two types of zooms as the curves follow the same kind of path.
Graph 18: Resizing: average time's comparison by task
Times overlook by testers Not as obvious as the rotation results, the iconic zoom still performed best for 80% of the testers [GRAPH 19]. The average time marks a clear disparity between the two with 2.3 seconds of difference in favor of iconic with 12.3 seconds and 14.6 respectively.
Graph 19: Resizing: average time's comparison by tester
Technologic Data Set Summary: 10 data points
Mean = 14.63
Low = 9.61
High = 19.8
First Quartile = 13.5
Median = 14.61
Third Quartile = 16.2
Standard Deviation = 2.79
Variance = 7.784
95% confidence interval for actual Mean: 12.64 thru 16.63
Final Project Thesis
83
02.10.2012
Iconic Data Set Summary: 10 data points
Mean = 12.28
Low = 9.16
High = 15.7
First Quartile = 11.2
Median = 12.06
Third Quartile = 13.5
Standard Deviation = 1.86
Variance = 3.46
95% confidence interval for actual Mean: 10.95 thru 13.61
Histogram: Like with the rotation results, the iconic repartition of the data is more compact and less spread [GRAPH 20].
Graph 20: Histogram + Distribution curves + densities
Means comparison: Obviously the iconic results are way better than the technologic ones on this box plot graphic [GRAPH 21]. However, it is important to notice R considered two good results for technologic as outliners which increased a little the difference between the two gestures.
Graph 21: Box plot Resizing Technologic and Iconic
Statistical Test: The technologic data is consistent with a normal distribution: P= 0.55 where the normal distribution has mean= 14.70 and standard deviation = 3.648.
Final Project Thesis
84
02.10.2012
The iconic data also is consistent with a normal distribution: P= 0.94 where the normal distribution has mean= 12.34 and standard deviation = 2.309. The probability of this result, assuming the null hypothesis, is 0.030 [TABLE 18]. The two samples come from the same population with a probability of 97%. The null hypothesis is accepted and there is a difference between the two means. Iconic zoom performed better with this panel of testers.
Table 18: Resizing: times t-test table
Tries and errors The learning curve is less convincing as the previous rotation tries learning curve [GRAPH 22]. Still it shows some improvements when we compare the first 4 tasks and the last 4 tasks.
Graph 22: Resizing: number of tries comparison
8.5.5 Final Times overlook by tasks The time taken for each task varies a lot between testers. However the second task has slightly better results overall compared to task 1 and 3 with the two types of gestures [GRAPH 23].
Graph 23: All times for each task from Technologic and Iconic side by side
Times overlook by testers If we look at the data by testers [GRAPH 24], we can't see a clear trend for any gestures. Even the number of best scores is 50-50. There isn't any sign that the second walkthrough was better than the
Final Project Thesis
85
02.10.2012
first one. Even the volumes are identical. The mean are practically equals with 1 seconds of difference in favor to technologic at 54.3 seconds.
Graph 24: Final: average time's comparison by tester
Technologic Data Set Summary: 10 data points
Mean = 54.30
Low = 32.5
High = 69.1
First Quartile = 42.2
Median = 54.74
Third Quartile = 64.8
Standard Deviation = 12.3
Variance = 151.29
95% confidence interval for actual Mean: 45.52 thru 63.09
Iconic Data Set Summary: 10 data points
Mean = 55.34
Low = 31.8
High = 73.3
First Quartile = 49.5
Median = 56.96
Third Quartile = 49.5
Standard Deviation = 12.1
Variance = 146.41
95% confidence interval for actual Mean: 46.65 thru 64.02
Histogram: The histogram [GRAPH 25] shows similar repartitions, densities and more important identiqual distribution curves. The box plot shows differences but it seems to be duw to the considered outliner of iconic.
Graph 25: Histogram + Distribution curves + densities
Final Project Thesis
86
02.10.2012
Statistical Test: The t-test finally shows that the two groups are identical up to 80% and there is no difference between them [TABLE 19].
Table 19: Final: times t-test table
Tries and errors Testers take around 9 commands with iconic gestures and 15 with technologic gestures to achieve a task requiring a movement, a rotation and a zoom [GRAPH 26]. A really optimistic goal would be to achieve a task with three commands. That is almost impossible, because it would meant no mistakes at all. The difference is significant but as the command counting systems of the two gestures types differ from one another, they can't really be compared. Nonetheless, it can be noticed on the table [TABLE 20] that it takes a lot more rotations than the other command with technologic gestures.
Graph 26 Final: number of tries comparison
Table 20: Final: tries and errors t-test table
8.5.6 Summaries Here is an overview of all activities. It brings quickly information to compare the statistics against each other. The first summary [TABLE 21] regroups the stats of the time performances. The second summary [TABLE 22] gathers the number of errors made by the users during the evaluations. It is important to note that errors are counted as the total number of commands done during the activity subtract to the minimum commands needed to achieve it. Nevertheless, the results of error rates must be taken with great caution. Indeed, as mentioned previously the way the two type of gestures count the number of commands differs from one another, specifically the rotation and zoom commands. This brings huge differences due to this nature and must be considered. Final Project Thesis
87
02.10.2012
Performances summary
Table 21: Activities summaries table
Errors summary
Table 22: Activities errors summaries table
8.5.7 Questionnaire Here are the results of the questionnaires [TABLE 23]. The scale for each question goes from 1 to 7. The data is divided in two sections. The total means of every question is compared. In addition, two columns give the results from the groups first and second attempts. The idea is to check if a significant difference exists between the first walkthrough and the second one. That is interesting to see if the fatigue perception grows over time. Statistical t-tests are done to check if the differences are significant or not.
First we can see that testers thought the techno gestures required less effort and were smoother than the iconic ones. The difference is small and there is no evidence the perception of the testers proves actual facts. But the iconic gestures are more accurate with a huge difference that is statistically significant. The testers seem to have preferred all the commands from iconic, specially the selection with a big significant difference. Otherwise the results are close from one another. Important to note testers liked the technologic feedbacks better than the iconic ones. They found them more clear with more useful information. Overall the results are average with no real huge differences between the two modes, except the selection and the feedback quality. The difference between the first and second attempt doesn't show anything. Testers didn't seem to give better grades any general on the first or second walkthrough. Final Project Thesis
88
02.10.2012
On the fatigue side, the techno gestures seem to be less fatiguing. This goes well in agreement with the second question of the form about the effort required for operations. But overall the general fatigue was practically equal. The real deal that stands out is the arms fatigue. Every tester complained about it. It makes no differences between technologic or iconic. The testers' arms were exhausted way more than the others. Finally if we compare the first attempt with the second, the fatigue from the first walkthrough can be feel in the second one as the testers are almost always a little bit more tired.
Table 23: Questionnaires' results table
8.6 Analysis This analysis is divided in several parts. First it will talk about general observations on comments and behaviors of the testers during the evaluation. Then it will analyze the quantitative and qualitative results of each command separately. And conclude with a synthesis of the overall performances.
First of all, users need practice to understand, remember and use correctly the commands. The environment of the application is new and users have to find the marks. The style of gestures is particular and not very common. The use of the right hand just for pointing and the left hand exclusively to execute some commands can be perturbing and confusing at times. Users also have to know where their hands must stand for a good recognition. All these little inconvenient are reduces over time. The reason to have put a training level was to let users having their hands on all the commands without being tested and to mainly not be totally lost in the first evaluated activities. And it worked. Users found their marks, played with the objects and learn how the system is working. It is also interesting to notice users don't do the gestures the same way. For example, the pointing: some users point with just the index finger which results in a smooth and accurate movement. Others use the whole hand with all the fingers outstretched for pointing. This way the pointer shakes more and it is harder to be precise. Others have the palm of their hand facing the camera which works pretty well but can be a little harder to be accurate. Final Project Thesis
89
02.10.2012
The selection level is an easy activity with simple identical tasks. The distance between the targets is always the same, the size is also equal, just the direction the pointer has to move changes. One repeating mistake has been observed several times. Sometimes users forgot to release the object before going to the next and lost a few seconds wondering why the task wasn't validate. They had indeed the obligation to select the object and then release it with the pointer still on the object. The iconic selection can only be used one way but the technologic doesn't. Usually users put down the hand by a vertical movement of the whole arm. One user used his elbow instead to reduce the effort. He rotated his front arm from vertical to almost horizontal until it reached the selection area like pushing a button. It worked well. From a statistical point of view, none of the two selections do stands out from the other. Even though, the iconic had a better average time, 10% lower than the technologic one, with fewer outliners and a better median, the t-test doesn't reflect the advantage. 56% of the tasks were performed faster with the iconic selection command and 60% of the users finished the activity quicker.
The Move activity also uses the selection command. We'll see if it shares the same conclusion as the selection activity which was that there were no differences between the two commands. First, some testers liked this activity. They found it fun and it was their favorite. A few little missteps happened though. In iconic mode, some users tried to use their left hand to move the objects forgetting it was the right hand. Despite the fact it was specified in the videos, some users tried to do very quick sudden movements which result badly. They understood rapidly they had to go slower. Otherwise the activity went smoothly. On the quantitative side, it is the same as with the selection. The overall statistics are better with iconic by 10%. The standard deviation is 40% smaller with 0.89 versus 1.44. Even the number of tries is smaller with iconic by 10%. But the t-test makes it clear: there is no difference between the two methods just as the selection.
So on the overall performance of the first two activities that evaluate the selection commands, there is no significant difference between the two methods. But by looking at the qualitative results, it is clear that the iconic selection is an obvious winner.
The rotation activity is harder than the first two activities. It requires being more accurate. In this activity as well as in the next one, resizing, the two commands differ by the type of gesture it requires. The technologic rotation is controlled over a horizontal sliding movement and the iconic rotation requires circular movement from the user. The technologic and iconic rotations approaches are very different. Technologic uses automatic speed to rotate objects and iconic waits on the user movement to increase or decrease the rotation. With technologic, users were a little disturbed by the variable speed Final Project Thesis
90
02.10.2012
of the rotation. They found it hard to manage it correctly. The technique used to get the object in the right orientation could be different between users. Some used the neutral zone as it was designed for to stop the object's rotation. Others used the pointer as regulator by pointing out of the object to stop it. Let's note that it wouldn't be possible to use this last technique if the lock mode was activated. For the iconic rotation there is only one way to execute the command; make circles. Both techniques had one same issue; the pointer going out of the object as it rotates. The user has to carefully aim near the center of the object or readjust the pointer during the operation. This costs some concentration of the user. The results are pretty simple. The iconic rotation performed better. All statistics point out the superiority of the iconic gesture. The average time is better by 3.5 seconds, the median wins by 2 seconds, 90% of the users performed better with iconic and finally the t-test confirms the significant difference of means by accepting the null hypothesis by less than 1%. Qualitatively users favored the iconic gestures but the difference is very small, 4.2 against 4.3, and it isn't significant. The users liked the simplicity and the feedback of the technologic rotation, except the variable speed which they think is not a bad idea but requires more practice. They also liked the fact it doesn't require lots of effort to make it work. On the iconic side they found more entertaining to use, easier and more natural than the technologic rotation gesture. They found the feedback a little shallow they would have liked to see more information.
The resizing activity is very close to the rotation activity. Even the gestures are similar to the rotation ones. The technologic zoom still uses a continuous movement controlled over a vertical slide. Iconic zoom still has to make circles around a center point but the activation posture is different from the rotation activation posture. So the comments about rotation still stand as well. The main difference is that this level deals with objects of different sizes. Users found it harder to point at smaller objects. It takes more effort, more concentration to be more accurate and requires being more stable. This goes for both modes. The two methods performed better than the two rotations on average, especially technologic. This improvement may come from the learning of the previous similar activities. However, the iconic zoom gesture is still ahead from the technologic zoom just like with rotation. The differences between the two modes are not as obvious as before. Still, iconic is 80% of the time faster than technologic. The means are closer with 12.3 seconds for iconic versus 14.6 for technologic but iconic keeps a clear advantage. Finally the t-test verifies the null hypothesis and accepts it as there is a significant difference between the two means. Iconic zoom performed better with this panel of testers. The users still preferred the iconic gesture for the ease of use. But interestingly, users liked more the zooms over the rotations by at least 0.5 points. The reason may be the users gained skills and had less difficulty to pass the tasks and their perception made it better.
Final Project Thesis
91
02.10.2012
The final activity was easily the hardest activity of them all. It regroups all the commands at once. Every command has an impact on the objects. The order the commands are done is up to the users. They actually didn't all use the same order. Some started by moving the object to its target. Others resized and rotated them before moving them. Even though the commands reminder was showed right before the activity, most users had forgot partially how to do some commands. The difficulty was a little high in their opinions. The precision required for the three commands combined was too high. This activity was the most exhausting, physically and mentally as it demands more concentration to manage the three commands. In both cases, users were happy to get over with it and that it is wasn't any longer. Iconic gestures performed better in almost all previous activities with more or less significance. It would be expected that the final activity follow the same path. Actually, it doesn't. The technologic gestures catch up with the iconic gestures a give a very close score. It even has a slightly better overall average time by 1 second. Except that the results a very similar. Even the distributions curves almost overlap each other. Even the statistical gives an important p-value of 80% which means the two methods are equal. If we try to find reason, some elements from the users' comments came out. First, on the iconic side, users had to switch between commands. Some users got confused and mixed with the rotation and zoom activations which require one or two fingers. On the other hand, technologic rotation and zoom have the same activation and the feedback remind directly how to use the two commands at the same time.
To summaries, iconic gestures have better or at least equal results compared to technologic when they are used individually with no perturbation from the other commands. To select and move objects the quantitative tests don't show any significant difference even though the overall are slightly better for iconic. Difference is more qualitative. Users liked way more the iconic selection with an outstanding 6.5 out of 7 versus 4.4 for the technologic selection. Closing his hand like grabbing an object is something that works pretty well. It is simple to use and to remember. Even technically it is simple to detect with a really good accuracy. It hasn't a specified area for activation. Even though, it doesn't mean the selection of technologic is bad. It also works pretty well and is simple to use and detect. It just doesn't add up with the iconic selection. For the rotations and zooms, it is quite the contrary. The user preferred the iconic gestures with a tiny insignificant advantage on the qualitative side. But the quantitative results show a clear benefit to use iconic gestures to perform rotations and zooms. But when it comes to use all the commands together the two methods seems to be equivalent. There is no evidence one method is better than the other. Out of the ten testers, two of them had their best results in all the five activities with the iconic gestures. With technologic, only one tester almost did the grand slam with four out of five activities but add an overall success of near 70%. In total, when it comes to count every tasks technologic loses with 44% of success versus 56% for iconic.
Final Project Thesis
92
02.10.2012
Chapter 9 Extras Applications 9.1
Gesture Factory Application......................................................... 94
9.2
Bing Map Application.................................................................. 94
Chapter 9 is about everything that is out of the final evaluation. It treats the two other applications developed during the project that were not part of the final evaluation.
Final Project Thesis
93
02.10.2012
9. Extra applications 9.1 Gesture Factory application This is the first application done in this project. Its goal was to design, develop and test quickly gestures and objects using kinect and the candescent library. The application has no goal of being finished, polished or delivered to anyone. It's just to try and retry things over and over again. All the twenty gestures described in this document are in the gesture factory. Some adjustments have been made in the test application, which regroups only the used gestures for the evaluation, to improve the quality, the simplicity and the robustness. The application is pretty simples. It has a screen in which a layer can display information, buttons to switch quickly from a type of display, recognition or both to another. This gives a quick way to compare designs.
9.2 Bing Map application The Bing map application had for purpose to use the different commands in real situation. The application proposes to browse the entire world using a Bing map applet in a .Net environment. The NUI used in the test application is mapped to the mouse controls. The right hand controls the mouse pointer and the commands recreate mouse's action like the click, double-click, wheel roll, etc. The user can navigate within the map and perform zooms. The idea was to integrate this application at the end of the usual tests. The user would have had to execute a scripted scenario like finding the Eiffel tower in a minimum of time or/and browsing freely and give his impression. The application was dropped early in the development due to the time required for the tests. Indeed, the previous evaluation takes around 15-20 minutes twice, one for each modality, without counting additional stuff like videos and questionnaires. Adding another test would have made the whole session too long. Nevertheless, the application is running and the user can use kinect to navigate the map. Some commands don't quite work perfectly like the detection of false double-clicks. It is also important to note that the commands done with kinect replacing the mouse controls apply not only in the application but on the all operating system. So the users have to be careful. The real mouse device can't be use at the same time as long as the application is running. This may cause lots of problems. The application needs for development time to be usable properly. If it is well calibrated, it also gives options to use kinect as a default full input device replacing the mouse.
Figure 51: Gestures factory & Bing map application
Final Project Thesis
94
02.10.2012
Chapter 10 Conclusions 10.1 General comments over the development.................................... 96 10.2 Conclusion.................................................................................... 96 10.3 Future Work.................................................................................. 97
This is the final chapter. It brings the conclusion of the project and also some insight over the development during the project. Finally, it ends up what could be done and improved in the future with this project.
Final Project Thesis
95
02.10.2012
10. Conclusions
10.1
General comments over the development
The project took a long time to finish. Lots of phases were involved; learn the technologies, design the multiple gestures, implement these gestures, test them, create a full application for tests, design feedbacks, do pre-evaluations, redesigned the gestures, do the final evaluation, collect the data, analyze them and finally write the thesis. Some phases were done quicker than others. But what took most of the time was the implementation of the application test and tweaking the gestures to make them robust and simple to use as possible. The hardest part was to manage the balance of difficulty, the release of the commands and small variation in the recognition like losing a finger or the hand for a few frames. If the release is too easy the command won't hold and if too stiff the command will never be deactivated. For a more robust recognition tricks must be used, like using history of status and timers to avoid involuntary sudden change of states. The point is to do all that without reducing too much the reactivity of the system. Controlling the commands by voice recognition and gestures was considered at the beginning of the project. The idea was drop pretty early as the voice recognition system was a little bit slow to recognize words and because it would have meant another walkthrough the five tests. The all evaluation would have been too long. The project paper tries to gather important information but more data, graphs and information are available in the Excel and R files provided with the project.
10.2
Conclusion
Natural user interfaces become more part of our everyday life devices. It is important to find good designs for gestures that are well adapted to their purposes. The main goal for NUI is to be efficient but also to be simple and liked by a majority of users. If the users don't like the way controls works, they won't adopt the system and they will go somewhere else. The point of this project is to try to figure out if a type of gestures is better against others. Around 20 gestures have been designed and implemented. They were put in two groups of gestures; the technologic group supposed to be close to the machine to be effective as possible and the iconic group which gathers gestures that are believed to be more natural. Each gesture is designed for one of the four commands; the selection, the drag and drop, the rotation and the zoom. During development some recognition techniques prevailed over others. Like for instance, the rope technique for the rotation and the zoom which was better than the circle detection. Checking the angle between a fix point and compare it with previous ones is quicker, more responsive, more accurate and more versatile than recognizing circles in a path of position points which is slow, not responsive enough, hard to detect and requires precise movements.
Final Project Thesis
96
02.10.2012
All gestures couldn't be evaluated in the final phase of the project due to time it would have taken. The application of this project has been designed to test and to evaluate each command with different gestures. It provides different experiences to users in which the application can monitor the behaviors and the progressions to afterward analyze the data. For most of the users the experience was good but exhausting. They practically all complained about the fatigue felt in their arms. Most of them think they could improve their performance with more practice. The evaluation provided interesting results. It turns out that iconic gestures generally performed equally or slightly better but they are in majority favored by users. It was especially and significantly better for rotations and zooms according to quantitative results and student t-test. Not that the technologic gestures weren't good but they didn't meet their audience.
Finally, using the left hand to extend the number of commands of the right hand without decreasing the pointer precision worked well. It demands users a certain adaptation as they switch focus on the right hand or the left hand at first but then it fades away over time and practice. When it is possible, giving the possibility to choose the method to achieve a task would be the way to go and not imposing one. The feeling of freedom of choice is always well received by users who will felt less constraint. However if one type of gestures should be taken it would obviously be the iconic one.
10.1
Future work
To take this project further, several directions are possible. A first idea would be to use two Kinect devices instead of one. The first one is facing the user and the second one is on the side perpendicular to the user to be able to capture lateral movements. This idea was suggested at the beginning of the project but was finally dropped. Another idea would be to just use the left hand postures as an activator of commands. The right hand would manage the rest. This would let the user focusing mostly on the pointer and not both. Secondly the hands would have fewer risks to be close to each other and disturb the recognition as the left hand could stay static. Thirdly, the left arm would be less tired. Of course a lock system on the objects would be necessary as the pointer would be moving during operations.
The applications could use some new levels and objects. Speed and endurance tests could be added. It would count the number of selections users can do in one minute for example or the number of full rotations. The results could then be compared with other gestures or modalities to see the effectiveness. The Move level could be changed in the way that the target zone area would be reduced to a simple circle. This way the data recorded could be used to compute the index of performance and compare the results with other drag and drop activities. Some improvement can be done around the Final Project Thesis
97
02.10.2012
editable levels like adding new options. Levels could have scores and multiple objects on screen like the training but the evaluation would have to be completely redesigned. The rotation could be done differently. For now, the rotation is always done at the center of the object. An improvement would be to rotate the object around the pointer position. Some sounds could be added to provide some quick simple feedbacks.
Final Project Thesis
98
02.10.2012
11. References 1.
MSc Surveying, Daniel Binney, Jan Boehm (Supervisor), Performance Evaluation of the PrimeSense IR Projected Pattern Depth Sensor UCL Department of Civil, Environmental and Geomatic Engineering, Gower St, London ,WC1E
2.
Jarrett Webb and James Ashley, Beginning Kinect Programming with the Microsoft Kinect SDK, 2012, Apress
3.
http://candescentnui.codeplex.com
4.
http://hci.rwth-aachen.de/tiki-download_wiki_attachment.php?attId=1508
5.
http://www.primesense.com/technology
6.
http://www.renauddumont.be/fr/2012/kinect-pour-windows-vs-kinect-pour-xbox
7.
http://en.wikipedia.org/wiki/Kinect
Final Project Thesis
99
02.10.2012
Appendix Candescent License New BSD License (BSD) Copyright (c) 2011, Stefan Stegmueller All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Candescent.ch nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Correction Welch http://biol09.biol.umontreal.ca/BIO2041e/Correction_Welch.pdf 1. Sample sizes equal? Yes go to 2 No go to 6 2. Equal sample sizes. Distribution: Normal go to 3 Skewed go to 5 3. Normal distributions. THV result: Variances homogeneous use any one of the 3 tests (most simple: parametric t-test). Variances unequal go to 4 4. Variances unequal. Sample size: Small use the t-test with Welch correction. Large use any one of the 3 tests (most simple: parametric t-test). 5. Skewed distributions. THV result: Variances homogeneous all 3 tests are valid, but the permutational t-test is preferable because it has correct type I error and the highest power. Variances unequal normalize the data or use a nonparametric test (Wilcoxon-Mann-Whitney test, median test, Kolmogorov-Smirnov two-sample test, etc.). 6. Unequal sample sizes. Distribution: Normal go to 7 Skewed go to 8 7. Normal distributions. THV result: Variances homogeneous use the parametric or permutational t-tests (most simple:parametric t-test). Variances unequaluse any one of the 3 tests (most simple: parametric t-test). Power is low when the sample sizes are strongly unequal; avoid the Welch-corrected t-test in the most extreme cases of sample size inequality (lower power). 8. Skewed distributions. THV result Variances homogeneous use the permutational t-test. Variances unequalnormalize the data or use a nonparametric test (Wilcoxon-Mann-Whitney test, median test, Kolmogorov-Smirnov two-sample test, etc.).
Final Project Thesis
100
02.10.2012
Final Project Thesis
101
02.10.2012