Moving the Mouse Pointer Using Eye Gazing Computer Science ...

9 downloads 7158 Views 1MB Size Report
Mini Thesis ... Department of Computer Science ... 2. ABSTRACT: With a growing number of computer devices around us, and the ... CHAPTER 2. ..... example, when clicking a button on the screen in most cases mouse and eyes move to the.
Moving the Mouse Pointer Using Eye Gazing

Computer Science Mini Thesis

Ibraheem Frieslaar

2734760

Supervisors: Mr Mehrdad Ghaziasgar and Mr James Connan B.Sc. Honours

Department of Computer Science University of the Western Cape 2011 1

ABSTRACT:

With a growing number of computer devices around us, and the increasing time we spend on interacting with such devices, we are strongly interested in finding new interaction methods which ease the use of computers or increase interaction efficiency. Eye tracking seems to be a promising technology to achieve this goal. This project investigates the use of eye gaze tracking techniques to allow a user to move the mouse pointer using his or her eyes.

2

TABLE OF CONTENTS ABSTRACT: ............................................................................................................................................ 2 CHAPTER 1............................................................................................................................................ 5 INTRODUCTION ............................................................................................................................. 5 1.1 Human–computer interaction (HCI):.......................................................................................... 5 1.2 Current Research: ....................................................................................................................... 5 CHAPTER 2............................................................................................................................................ 8 USER REQUIREMENTS DOCUMENT ........................................................................................... 8 2.1 Users view of the problem .......................................................................................................... 8 2.2 Description of the problem ........................................................................................................ 8 2.3 Expectations from the software solution .................................................................................... 9 2.4 Not expected from the software solution .................................................................................... 9 CHAPTER 3.......................................................................................................................................... 10 REQUIREMENTS ANALYSIS DOCUMENT ................................................................................ 10 3.1 Designer's interpretation ........................................................................................................... 10 3.2 Complete Analysis of the problem ............................................................................................ 10 3.3 Current Solution: ...................................................................................................................... 11 3.5 Suggested Solution: ................................................................................................................... 11 CHAPTER 4.......................................................................................................................................... 12 USER INTERFACE SPECIFICATION (UIS) ................................................................................. 12 Training Phase ............................................................................................................................... 13 Using the System................................................................................................................................ 17 CHAPTER 5.......................................................................................................................................... 18 HIGH LEVEL DESIGN .................................................................................................................. 18 5.1 High Level Data Dictionary ...................................................................................................... 18 5.2.1 Relationship between objects ................................................................................................. 20 5.2.2 High Level Design of Eye Gazing and Mouse Movement Diagram ....................................... 21 CHAPTER 6.......................................................................................................................................... 22 3

LOW LEVEL DESIGN .................................................................................................................... 22 6.1 Low Level Data Dictionary ....................................................................................................... 22 6.2 Low Level Design of Eye Gazing and Mouse Movement Diagram .............................................. 24 6.3 Detailed Methodology ................................................................................................................ 25 6.3.1.

Haar-like feature ........................................................................................................... 25

6.3.2.

Calculating the centre of the nose and calculating the location of the iris .................. 25

6.3.3.

HSV ................................................................................................................................ 26

6.3.4.

Dynamic Threshold ....................................................................................................... 27

6.3.5.

Locating the Pupil.......................................................................................................... 27

6.3.6.

SVM ............................................................................................................................... 27

Chapter 7............................................................................................................................................... 29 CODE DOCUMENTATION .................................................................................................................. 29 LIST OF FIGURES .................................................................................................................................... 35 REFERANCE ....................................................................................................................................... 36

4

CHAPTER 1

INTRODUCTION

1.1 Human–computer interaction (HCI):

HCI is the study, planning and design of the interaction between users and computers. It is often regarded as the intersection of computer science, behavioural sciences, design and several other fields of study [1]. Interaction between users and computers occurs at the user interface. There are new modalities for computer interaction like speech interaction, input by gestures or by tangible objects with sensors. A further input modality is eye gaze which nowadays finds its application in accessibility systems. Such systems typically use eye gaze as the sole input, but outside the field of accessibility eye gaze can be combined with any other input modality. Therefore, eye gaze could serve as an interaction method beyond the field of accessibility. The aim of this work is to investigate the use of HCI using eye gaze. 1.2 Current Research:

An eye-gaze interface seems to be a promising candidate for a new interface technique, which may be more convenient than the ones we use. Traditionally, disabled people who cannot move anything except their eyes use eye gaze interaction. These systems are designed to direct the computer solely by the eyes. Such systems work well and are a great help for people who need them, but for others they are cumbersome and less efficient than keyboard and mouse.

An eye-gaze interface might offer several potential benefits:

I.

A benefit of eye tracking could be reduced stress for hand and arm muscles by transferring the computer input from the hand to the eyes. This need not necessarily put extra load on the eye muscles because for most interactions the eyes move anyway. For

5

example, when clicking a button on the screen in most cases mouse and eyes move to the target.

II.

Video-based eye tracking works contact free which means that there is little maintenance necessary. There is no need to clean the device, which is a typical problem for keyboards and mouse devices. Placing the camera behind strong transparent material results in a vandalism-proofed interface, which is nearly impossible to realize for keyboards and mouse devices.

III.

The eyes tell a lot about what somebody is doing. Tracking the eyes provides useful information for context-aware systems. In the simplest form an eye tracker tells where the attention is, which already has a big potential for the implementation of contextawareness. Simple analysis of the eye tracker data can detect activities like reading. Analysis that is more sophisticated could reveal the physical or emotional condition of a user, her or his age, and degree of literacy. [2]

There are also possible problems:

I.

The eyes perform unconscious movements and this might disturb their use as computer input. It is not clear to which degree people are able to control the movement of their eyes. The ability to control the eyes consists of both suppressing unintended movements and performing intended eye movements. It seems that we are at least able to control where we look because this is required by our social protocols. However, it is not clear whether we can train the motor skills of the eye muscles to the same extent as we can train the fingers for playing the guitar [2].

II.

Misinterpretation by the gaze interface can trigger unwanted actions based on noise

6

III.

From other input devices we know that extensive use of particular muscles or muscle groups can cause physical problems called RSI (repetitive strain injury) [7]. There are fears that this might happen to the eye muscles too.

In this project we intend to capture the eye gaze location immediately once the user has correctly placed his face in front of the web camera and from the location of the pupil we will translate that position into mouse location.

7

CHAPTER 2

USER REQUIREMENTS DOCUMENT This chapter focuses on viewing the problem from the user’s perspective. The solution is based on the vital information acquired from the user.

2.1 Users view of the problem The user requires a system that should determine where the user is looking with a high degree of accuracy. High accuracy is the ability for the user to place the mouse within 32 pixels of the actual location, since this is a standard icon size. The proposed hardware and software requirements are: i.

A web camera

ii.

Windows 7 Operating System

iii.

Visual Studio 2010

iv.

Open Source Computer Vision Library (Opencv libraries)

v.

Library for Support Vector Machines (LIBSVM)

2.2 Description of the problem Disabled people who cannot move anything except their eyes would need a system where they will be able to use their eyes to move the mouse to the position where the eyes is gazing. However people who have no disabilities would also want such a system since its an alternative way to interact with computers.

8

2.3 Expectations from the software solution The user requires a system that should determine where the user is looking with a high degree of accuracy. The system should run in the background and track the eyeball without user intervention and move the mouse to the appropriate position according to their eye gaze. However this will only be possible for a specific number of points on the screen for this project.

2.4 Not expected from the software solution It is not expected to be able to process the image with more than one human head in the image at a time. People wearing spectacles will not be supported. This software is not expected to handle and infinite amount of points on the screen.

9

CHAPTER 3

REQUIREMENTS ANALYSIS DOCUMENT

3.1 Designer's interpretation The eyes perform unconscious movements and this might disturb their use as computer input. It is not clear to which degree people are able to control the movement of their eyes. The difficulty lies within identifying the eye correctly. Accurate data is required to make the correct decision.

3.2 Complete Analysis of the problem

I. II.

A web camera is required to capture the users face in real time. From the image, Haar-like features are used to determine the users face.

III.

The face is set as a region of interest (ROI)

IV.

Once the face is set as the ROI, we determine the nose location by taking half the width and height of the ROI [3]

V. VI. VII.

Using the location of the nose we can determine where the left eye is located. The left eye is now set to the new ROI. The image is converted to HSV. The HSV colour space is a popular colour space for skin detection since it is based on the human colour perception. [3]

VIII.

The value (V from HSV) is extracted and a threshold is applied on that image to determine the area of the pupil

10

IX.

The threshold image is scanned to find the last black pixel, that pixel will be the bottom point of the iris.

X. XI.

The coordinates of the iris is then sent to Support Vector Machine Based on the SVM’s prediction the mouse will move to the trained location.

3.3 Current Solution: I.

Lenovo’s eye-controlled notebook. Developed in conjunction with Swedish-based Tobii, this computer is the first fully functional prototype of Tobi’si eye-tracking technology that allows users to zoom, automatically centre on what they’re focusing on, auto-dim and brightens the display, scroll through documents with their eyes. [4]

II.

Sweden's eye tracking and control innovator Tobii has release a stand-alone eye control device called PCEye. The device tracks eye movement and translates it into mouse cursor action onscreen. Positioned in front of a PC monitor and connected via USB. [5]

3.5 Suggested Solution: The suggested solution aims to be a cost effective. All you would need is a basic web camera and the OpenCV libraries installed

11

CHAPTER 4

USER INTERFACE SPECIFICATION (UIS) There are two phases to the system, the training and the actual use of the system.

The system starts up with a simple two button graphical user interface (GUI). The user has the option to use the system or to train the system on his eyes.

12

Training Phase The training phase starts off with a black image with the users face; this helps the user to adjust his position, once he is in the correct position, he looks at the green circle in the top right corner, the user focuses on that point and then clicks the mouse to record the coordinates (Figure 1).

Figure 1 Calibration

13

Once the user clicks, another green circle will appear, the user then looks at the new point and clicks the mouse to record the position (Figure 2).

Figure 2 Second Point

14

Once the user clicks, another green circle will appear, the user then looks at the new point and clicks the mouse to record the position (Figure 3). This process will continue until 12 points have been recorded.

Figure 3 Third Point

15

As we can see the process will continue until 12 points (Figure 4), once the user clicks for the 12th point the training program will close. In the background the coordinates that have been saved will be sent to the SVM to be trained, the trained model will be then copied to the actual system which will be called.

Figure 4 12 Points

16

Using the System To use the system the user will click on the button labelled “Use System”. The main system will be a daemon. All the processing will run in the background. The user is just required to gaze at the correct location.

17

CHAPTER 5

HIGH LEVEL DESIGN In this chapter we will look at it from a high level of abstraction, while the low level view will follow in the next chapter. Since the system was programmed in C/C++ we do not have an Object Oriented Analysis so we will not include a class diagram.

5.1 High Level Definition of Concepts The following describes the various concepts used in this research. Object

Description OpenCV

OpenCV is an open source computer vision library. Its provides functions and structures that can be used to implement image processing applications.

Haar-like features

This is a method to determine objects within an image; like a face.

Region of Interest (ROI)

Region of Interest is a rectangular area in an image, which is segmented out for further processing.

Greyscale Colour Space

Grayscale is a single channel colour space. The conversion, to a shade of grey from a colour image, is established by calculating the effective brightness or luminance of the colour. This value is then used to create the shade of grey 18

that corresponds to the desired brightness.

HSV is a three channel colour space. It HSV(Hue,Saturation,Value) Colour Space

describes colour in terms of Hue, Saturation And Value (also known as Lightness or Intensity). [3]

Support Vector Machine (SVM)

An SVM is a set of related supervised learning methods that analyse data and recognize patterns, used for classification and regression analysis. The standard SVM takes a set of input data and predicts, for each given input, which possible class the input is a member of. [7]

19

5.2.1 Relationship between objects In the figure below the relationship between the objects are identified.

1:1 Input Frame

Haar Like

1:1 ROI

Features

1:1

1:1 Mouse Movement

SVM

Figure 5: Relationships between objects

20

5.2.2 High Level Design of Eye Gazing and Mouse Movement Diagram In this figure the key component to the system is slightly elaborated. It shows a high-level view of how the eyes are tracked in the system using computer vision methods.

Input Webcam

Image Processing

Move Mouse Pointer

Figure 6 High Level Design of System

21

CHAPTER 6

LOW LEVEL DESIGN

6.1 Low Level Data Dictionary Class

Attributes

Haar-like features

Using HaarDetectObjects we are able to detect the users face within the frame.

We then use SetImageROI to set a region of interest ROI

as the face we obtained from the Haar-like features.

Calculate the centre of the nose Calculating the centre of the nose

midX = (int)(r->width/2); midY = (int)(r->height/2);

Based on the position of the centre of the nose we can determine in which area the iris is located Calculating the location of the iris

cvPoint(midNoseX[midNoseIndexX1],midNoseY[midNoseIndexY-1]/2) cvPoint(midNoseX[midNoseIndexX1]/3,midNoseY[midNoseIndexY-1]/2)

HSV Conversion

The area of the iris is converted to HSV cvCvtColor(irisRoi, hsv, CV_BGR2HSV);

22

Extraction

The value (V) in the HSV, is extracted cvSplit(hsv, hue, sat, val, 0);

A dynamic threshold is applied to the value image Dynamic Threshold

based on the number of black pixels in the image. cvThreshold(val,val,thresh,255,CV_THRESH_BINARY);

We loop threw the threshold image to determine at Calculating the pupil location

what location is the last black pixel located. That black pixel will be the bottom point of the iris

SVM

Mouse Movement

The coordinates of the iris is sent to the SVM where it determines what position the eye is. SetCursorPos(x,y) will Moves the mouse based on the SVM predication

23

6.2 Low Level Design of Eye Gazing and Mouse Movement Diagram

Input Webcam

Move Mouse Pointer

Image Processing

ROI

Haar-like features

Calculating Centre of the Nose

Extraction

Dynamic Threshold

HSV

Calculating

Conversion

Area of the Iris

Calculating the pupil location

SVM

Figure 7 Low Level Design of System

24

6.3 Detailed Methodology The methodology used to create this system by elaborating on the following key computer vision techniques used: 

Haar-like features



Calculating the centre of the nose and Calculating the location of the iris



HSV



Dynamic Threshold



Locating the pupil



SVM

6.3.1. Haar-like feature Viola and Jones [6] adapted the idea of using Haar wavelets and developed the so called Haarlike features. A Haar-like feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in these regions and calculates the difference between them. This difference is then used to categorize subsections of an image. The key advantage of a Haar-like feature over most other features is its calculation speed; it minimizes computation time while achieving high detection accuracy. [6] 6.3.2. Calculating the centre of the nose and calculating the location of the iris

The method which is used to determine the centre of the nose follows:

25

(𝑵𝒐𝒔𝒆𝑷𝒐𝒊𝒏𝒕𝑿 𝑵𝒐𝒔𝒆𝑷𝒐𝒊𝒏𝒕𝒀)

These 2 equations will give us the coordinates of the centre of the nose. Once we have these coordinates we can determine the area in which the left iris lies. We need to get 3more points, the top of the nose (p1), the starting point of the eyebrow (p2) and bottom starting point of the eye (p3). The 3 equation follows:

(

P2

)

(

)

(

)

P1

P3

6.3.3. HSV The HSV colour space is a popular colour space for skin detection since it is based on the human colour perception [3]. It describes colour in terms of Hue, Saturation and Value (also known as Lightness or Intensity). Hue defines the dominant colour of an area, whereas

26

Saturation measures the degree of the dominant colour of an area in proportion to its brightness. Value is related to the colour luminance thereby storing the brightness information.

6.3.4. Dynamic Threshold In order to achieve a perfect image of just the user’s pupil, we need a dynamic threshold; lightning condition will differ on days and the time of the day. A default threshold of 50 is applied to the image, we then loop threw the threshold image to count the amount of black pixels; we have determined by trial and error that the amount of black pixels should be between 1000 and 1200. Thus if the amount of black pixel is less than 1000, the threshold will be increased and will be checked again, it will only go to the next phase once the black pixel are in the desired range, the same will apply if the black pixel are over 1200, only this time the threshold will decrease. 6.3.5. Locating the Pupil Once we have a perfect threshold image we go through the image to find the bottommost black pixel, this point will be bottom centre point of the pupil. To accommodate for slight user movements and image jumping, the position of the pupil is compared to the position of the pupil in the next frame if the position has changed by a large amount, the location of the pupil will be updated to that new value, if the position has moved slightly the position will remain the same.

Bottom centre point of the pupil

6.3.6. SVM SVM’s are a useful technique for data classification. It’s a set of related supervised learning methods that analyse data and recognize patterns, used for classification and regression analysis. 27

The standard SVM takes a set of input data and predicts, for each given input, which possible classes the input is a member of. [7]. The SVM will allow us to accurately predict which region the users are gazing at.

28

Chapter 7 CODE DOCUMENTATION The full code documentation for this project is not contained in this document due to the number of pages it covers. The code documentation, however, is found on the accompanying Compact Disk. Within the code documentation, the function of each class and the methods contained within them are described.

29

Chapter 8 TESTING In chapter 4 we describe a basic way in which the user should interact with the system. In this chapter, we document the process of testing both the user interface and the system, to ensure that it meets the requirements and that the system is robust. PLANNING THE TEST The environment in which testing took place was the Computer Science Honours Lab. It was the only available room with all the equipment needed. 3 participants were asked to participate in this testing phase, one of which was female and two males. PROCEDURE Testing the System



The subject sat in the same position when he trained the system



The subject looked at each point on the screen in a systematic order, from point 1 till 12.



Each time the subject looked to a different point, two things were recorded; Time the mouse took to move to the specific point and how many jumps occurred before the mouse moved to the correct location. The time the mouse took to move to the correct position was divided into three bins: (1) less than 1 second, (2) 1 – 2 seconds (3) 3 or more seconds.



This test was carried out 10 times on each subject

Testing Results

Time:

The following tables represent, how long in seconds the mouse took to move to the correct position.

30

Subject 1:

1 2 3

Legend =2 Sec

1 2 3 P O 4 I 5 N 6 T 7 S 8 19 10 11 12

I 1 1 1 1 1 1 2 1 1 1 2

II 1 1 1 1 1 1 2 1 1 1 2

III 1 1 1 1 1 1 1 1 1 1 1 3

IV 1 1 1 1 1 1 2 1 1 1 2

Test V 1 1 1 1 1 1 1 1 1 1 1 3

VI 1 1 1 1 1 1 1 1 1 1 1

VII 1 1 1 1 1 1 2 1 1 1 2

VIII 1 1 1 1 1 1 2 1 1 1 2

IX 1 1 1 1 1 1 1 1 1 1 1 3

X 1 1 1 1 1 1 2 1 1 1 2

VIII 1 2 1 1 1 1 1 1 2 1 1

IX 1 2 1 1 1 1 1 1 2 1 1

X 1 2 1 1 1 1 1 1 2 1 1

Table 1 Subject 1 Time Taken For Mouse to Move

Subject 2

1 2 3 P 4 O 5 I 6 N 7 T 8 S 9 10 11 12

I 1 2 1 1 1 1 1 1 2 1 1

II 1 2 1 1 1 1 1 1 2 1 1

III 1 2 1 1 1 1 1 1 2 1 1

IV 1 2 1 1 1 1 1 1 2 1 1

Test V 1 2 1 1 1 1 1 1 2 1 1

VI 1 2 1 1 1 1 1 1 2 1 1

VII 1 2 1 1 1 1 1 1 2 1 1

Table 2 Subject 2 Time Taken For Mouse to Move

31

Subject 3

1 2 3 P O 4 I 5 N 6 T 7 S 8 9 10 11 12

I 1 1 2 1 1 1 1 1 1 2 1

II 1 1 2 1 1 1 1 1 1 2 1

III 1 1 2 1 1 1 1 1 1 2 1

TEST V 1 1 2 1 1 1 1 1 1 2 1

IV 1 1 2 1 1 1 1 1 1 2 1

VI 1 1 2 1 1 1 1 1 1 2 1

VII 1 1 2 1 1 1 1 1 1 2 1

VIII 1 1 2 1 1 1 1 1 1 2 1

IX 1 1 2 1 1 1 1 1 1 2 1

X 1 1 2 1 1 1 1 1 1 2 1

Table 3 Subject 3 Time Taken For Mouse to Move

Time Taken For Mouse To Move 80% 70% 60%

Legend:

50% 40%

1: =2 Seconds

10%

14.44%

0.83%

0% 1

2

3

Figure 8 Time Taken For Mouse to Move

32

In order to calculate the accuracy the total amount of points ( ) the subjects looked was divided by the amount of points it actually went to ( ). There are 2 types of accuracy; per user accuracy and overall system accuracy.

Per User Accuracy: Subject 1 Accuracy:

(

)

Subject 2 Accuracy:

(

)

Subject 3 Accuracy:

(

)

Complete System Accuracy

System Accuracy:

(

)

33

Chapter 9 Conclusion This documentation has been written to describe the process of producing a real-time system using eye gazing to move the mouse pointer. The requirements were gathered, analysed, designed, coded and tested. The steps were iterated a number of times. We are happy to state that the system has satisfied the requirements gathered and we are excited about room for future extensions to this project. In conclusion this mini-thesis with special thanks to Mr Mehrdad Ghaziasgar, Mr James Connan and the University of the Western Cape for giving us the opportunity to work with such an exciting and practical project that has great promise for the future.

34

LIST OF FIGURES Figure 1 Calibration .................................................................................................................................. 13 Figure 2 Second Point .............................................................................................................................. 14 Figure 3 Third Point ................................................................................................................................. 15 Figure 4 12 Points ..................................................................................................................................... 16 Figure 5: Relationships between objects................................................................................................ 20 Figure 6 High Level Design of System .................................................................................................. 21 Figure 7 Low Level Design of System ................................................................................................... 24 Figure 8 Time Taken For Mouse to Move ............................................................................................ 32

35

REFERENCE [1] Baecker, Card, Carey, Gasen, Mantei, Perlman, Strong and Verplank Hewett, "ACM SIGCHI Curricula for Human-Computer Interaction," ACM SIGCHI, ISBN 0-89791-474-0, 2009. [2] Heiko Drewes, "Eye Gaze Tracking for Human Computer Interaction," Ludwig-MaximiliansUniversität, Munich, 2010. [3] Imran Achmed and James Connan, "Upper Body Pose Estimation towards the translation of South African Sign Language," University of the Western Cape, Cape Town, 2010. [4] Chris Ziegler. (2011, April) Engadget. [Online]. http://www.engadget.com/2011/03/01/tobii-andlenovo-show-off-prototype-eye-controlled-laptop-we-go/ [5] Tobii Technology. (2011, April) Tobii Technology. [Online]. http://www.tobii.com/pceye [6] Paul Viola and Michael Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features," in ACCEPTED CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2001. [7] John Shawe-Taylor & Nello Cristianini, Support Vector Machines and other kernel-based learning methods. [8] James E. McGreevey, CUMULATIVE TRAUMA DISORDERS IN OFFICE WORKERS, 2003, Public Employees Occupational Safety and Health Program. [9] Codeback. Codeback. [Online]. http://codeback.net/color-graphics-to-grayscale-algorithm

36

Suggest Documents