Hand gesture based user interface for computer using ...

3 downloads 0 Views 653KB Size Report
A. Ahmed, Institute of Space Technology, Islamabad Highway, near. Rawat ... Syed Akhlaq Hussain, Ali Ahmed, Iftekhar Mahmood and Khurram Khurshid.
Hand gesture based user interface for computer using a camera and projector Syed Akhlaq Hussain, Ali Ahmed, Iftekhar Mahmood and Khurram Khurshid

Keywords—Hands detection, hand tracking, fingertips detection, gesture recognition.

vision-based input interface for mobile devices, 3D motion of human fingertips is tracked using a single high frame rate camera and small input gestures are recognized in the air and P. Mistry is developing WUW [1] in which hand gestures are used as the interactive mechanism with projected information on any surface, but it requires user to wear colored markers. Our proposed system uses hand gestures as a tool of interaction with the projected information, and thus provides mobility and intuitive interface in one package. In our research, fingertips are extracted from the segmented hand region and single stroke gestures are recognized. These gestures are then translated into actions.

I. INTRODUCTION

II. DESCRIPTION OF PROJECT

S the need of mobility is growing, computing devices are becoming smaller and easier to carry for the user. With this miniaturization, we can access the information anywhere and everywhere but this does not allow us to have input interfaces that have a wide operational area naturally required by humans. On the other hand, multi-touch and gesture-based input interfaces do offer a wider operational area providing a very intuitive and natural interactive experience for the user, who can thus directly interact with information. But unfortunately, these interfaces fail to provide mobility [1]. Vision and gesture-based human computer interaction is an ongoing research area, and there has been a significant research carried out to make human computer interaction friendly and intuitive. T. Singh [2] proposed finger tracking to control mouse pointer using a single camera. L. Jin [3] suggested a vision based finger writing character recognition system, the trajectories of the finger-tip are tracked and reconstructed as character pattern and then recognized. Y. Liu [4] presented hand gesture-based text input for wearable computers, characters written with fingertip are recognized using B-splines based character recognition. A. Sanghi [5] proposed a fingertip detection and tracking system as a virtual mouse and presented its application as a signature input device and an application selector. V. Vezhnevets [6] presented a

The system comprises a camera, a pocket projector and a mobile computing device as shown in Fig. 1. Projector projects the display on the wall or any other plain surface, a camera detects and tracks the fingertips using various digital image processing and computer vision techniques, which will be discussed later. Single stroke gestures are interpreted to act, thus providing an interaction mechanism for the projected applications. The block diagram of the system is shown in Fig. 2.

Abstract—In this paper, we propose a hand gesture based human computer interaction system comprising of a webcam and a pocket projector. The projector projects the display on the wall on any other plain surface. User can interact with the projected screen using his fingertips which are tracked in air by the camera using ‘camshift’ tracker. A comparative study of different methods of hands and fingertips detection has been made. A robust method has been developed to detect and recognize single stroke gestures traced with fingertips, which are then translated into actions.

A

S. A. Hussain, Institute of Space Technology, Islamabad Highway, near Rawat, 44000, Islamabad, Pakistan ([email protected]) A. Ahmed, Institute of Space Technology, Islamabad Highway, near Rawat, 44000, Islamabad, Pakistan ([email protected]) I. Mahmood is with the Department of Communication Systems Engineering. Institute of Space Technology, Islamabad Highway, near Rawat, 44000, Pakistan ([email protected]) K. Khurshid, is with the Department of Communication Systems Engineering. Institute of Space Technology, Islamabad Highway, near Rawat, 44000, Pakistan ([email protected])

Fig.1 system configuration

The paper is organized as follows. Our discussion would begin with the description of the techniques used for hand segmentation and hand tracking. It is followed by explanation of finger tip detection from the segmented hand. In the end, we discuss the gesture description and gesture recognition.

classified as a ‘skin’ or ‘non skin’ pixel. The flow chart of skin detection is shown in Fig. 4.

Fig.2 Block diagram of gesture-based user interface system

The very first task was to detect the presence of human hand in the field of view of the camera. III. HANDS SEGMENTATION Since human hand has a very complex geometry and possesses high variability, we have used skin color detection for hands segmentation because human skin color has its own specific features. Skin color detection depends significantly on the chosen color space and skin color distribution modeling. Different color spaces have a different skin and non-skin overlap. Skin pixel data base of 10856 skin pixels taken form 50 images of different people, with different illumination conditions, has been used for skin distribution modeling and for color space selection. YCrCb color space showed best results as it has the least skin and non skin overlapping area which is shown in Fig. 3. 100

50

V

Cb

160

0 140

-50 120

100 160

300

50

140

250

200

200 120

0 Cr

150

100

100

100

-50

50

0

A. Single Gaussian model Chrominance distribution distribution described in [6]. Pc|skin

Y

80

U

0

Y

(b)

(a)

|∑ |

.e

modeled



where

by

 µ  ∑  µ 

 λ  c  µ  ∑ c  µ

Gaussian (2) (3)



µ ∑! c 

∑



∑! c  µ c  µ 

B. Elliptical boundary model Lee and Yoo [8] concluded in their work that skin color distribution is approximately elliptical in shape and provides superior results than Gaussian distribution and is defined as

250

where

200 150 RED



is

c is the color vector, ∑ is the variance matrix and µ is the mean vector calculated from the training data. Pixels are classified as skin/ non skin pixels through  thresholding. If λ ≥ 15, a pixel is considered as skin pixel.

180

-100 100

Fig.4 Flow chart of skin detection

Φc c  $  Λ c  $

4



$ ∑)! c)

100



Λ ∑)! f) c)  µ c)  µ 

50

*

0 300 300

200 200 100 GREEN

100 0

0

BLUE

(c) Fig.3 Skin color distribution (a) YCrCb color space (b) YUV color space (c) RGB color space

Using the standard transformation described in [6], RGB is converted to YCrCb color space: Y=0.299R+2.588G+0.114B (1) Cr=R-Y Cb=B-Y we modeled the skin color distribution in the chrominance axis of the YCrCb color space with single Gaussian and elliptical boundary models. Modeling in the chrominant space caters for any scene illumination variations [7]. On the basis of skin color distribution obtained from the modeling input, a pixel is



µ ∑)! f) c) *

N ∑)! f) Here, n is the distinctive training vectors c) , and f) is the number of skin samples of color vector c) . Pixels are classified as skin pixel if Φ(c)> 6.1. Results obtained from both models are shown in Fig. 5.

ROI given by ‘camshift’ does not contain the whole hand in it, rather it contains the region where most of the skin pixels are concentrated. The boundaries of ROI are extended to enclose the whole hand. In Fig. 6, small rectangle shows the area returned by ‘camshift’ and outer rectangle is the extended ROI to apply skin detection. Then ‘Convexity defects’ and fingertip detection algorithms are implemented after separating the skin pixels in ROI of each frame.

(a)

(b) (c) Fig.5 Skin detection (a) Original image (b) Detection using single Gaussian model (c) Detection using elliptical boundary model.

Skin models were tested on 50 indoor images of different people with a relatively simple background. Elliptical boundary has showed better results than single Gaussian modeling as shown in Table I. False positives and true negatives are obtained by comparing the results with ground truth. Ground truth was found by marking the skin region manually. After the detection of a hand in the field of view, the next step is tracking the hand. TABLE I Results of skin detection

Modeling Single Gaussian Elliptical boundary

False positives 5.04 % 4.18 %

True negatives 1.25 % 1.65 %

IV. HAND TRACKING Once the hand contour is detected in the frame and its area is significant, the bounding box of the hand contour is selected as our desired ROI (region of interest), and the rectangle is passed to the ‘camshift’ function to track the hand in future frames. The ‘camshift’ tracking algorithm, available in OpenCV, allows to track a moving object with varying object size and shape on the basis of the distribution of any kind of feature of the object. We have used color histogram of the hand for tracking. [9] As first step, the image is converted into HSV color space, and for the given region of interest, a histogram is created against the hue values of pixels. At the start of the ‘camshift’ tracker, the histogram created is used to assign handprobability value of each pixel in the current video frame. This histogram is stored and used to calculate the hand-probability for each pixel in the ROI of the next frame using histogram back-projection method of OpenCV. In the next frame, ‘camshift’ calculates the center of gravity of the pixels, which have the highest probability of being part of a hand and shifts the ROI center to the calculated center of gravity. ‘Camshift’ also calculates the size and angle of hand bounding rectangle, therefore it can track the changing hand size and angle due to the relative movement of hand towards and away from the camera. [10]

Fig.6 Hand tracking

V. FINGERTIPS DETECTION After detecting the hand region we employed curvature and convexity defects based fingertips detection for the segmentation of fingertips to be used in gesture recognition. A. Curvature based fingertips detection: After the extraction of the contour of the hand, curvature based algorithm similar to the one described in [11][12] is used for the detection of fingertips. First, by applying constraint on the area of the contour, small false alarms are negated. The curvature of all points of the contour is computed as -. /0

12 123 .12 1243 ||12 123 ||.||12 1243 ||

(5)

where /0 is the point under curvature test, /0. is the preceding and /05. the succeeding point on the contour and x is the displacement index. K> 0.1 are considered as candidates for fingertips whereas, best results were obtained for x=5. Fingertips are separated from valleys by taking into account the direction of curvature. More than one fingertip candidate points for fingertips are eliminated by using distance constraint between consecutive points. Then center of mass of the hand is computed using OpenCV’s implementation [13], and only those points are considered which are above the center of mass of the hand. Finally, the most frequently detected points for 5 consecutive frames are chosen as fingertips as shown in Fig. 7. For now, we are only considering pointing and click postures so constraint on the distance between fingertip and center of the mass was applied. As shown in the Fig. 7, fingertips which are outside the circle of radius equal to one 8th of the perimeter of hand are not considered. Fig.7 (c) shows the pointing posture and Fig. 7 (d) shows the click posture.

TABLE II Results of fingertips detection

Fingertips detection method Curvature based Convexity defects (a)

(b)

True negative 20.30 20.3 % 11.53 %

False positive 14.81 % 5.76 %

VI. GESTURE RECOGNITION A gesture is the combination of different strokes drawn on screen, where a stroke is a path formed by combination of 2D points. As shown in Fig. 9, if we draw a ‘D’ on screen, we will draw it in two strokes.

(c) (dd) Fig.7 Curvature based fingertips detection

B. Convex hull algorithm for fingertips detection: While using convex hull algorithm, convex hull of the contour of the hand is computed using O OpenCV’s implementation of [14] similar to [15]. All peaks of the convexity defects are initially considered as potential candidate for fingertips. Convexity defects cts were refined by placing ng distance constraint between consecutive defects to get one defect for each fingertip, and then between the center of mass and fingertips to get the required postures me mentioned above. The results are shown in Fig. 8.

(a)

(b)

0 (c) (dd) Fig.8 Convexity defects based fingertips detection

Overall, fingertips ingertips detection with convexity defects showed better results than with curvature base based detection. The comparison is shown in Table II. Testing was performed on 30 test images of different people in front of a plain background. Ground truth was obtained by marking fingertips manually manually. Experiments showed wed that postures shown in Fig. 8 (b) and (c) performed better than Fig. 8 (d). Performance of fingertips detection is dependent on the contour extracted from the segmented hand region.

Fig.9 Gesture of drawing ‘D’,, a two stroke gesture

Fig.10 Examples of uni-stroke stroke gestures

Uni-stroke stroke gestures are those gestures which can be drawn with a single stroke i.e. shapes which can be drawn without lifting the pen from the paper. Some of these gestures include shapes like circle, ircle, rectangle, triangle and letters like S, N, U and C etc as shown in Fig. 10. Since we are interested in realtime interaction between the user and computer, we keep our interest limited to uni-stroke stroke gestures, gestures because they are easy to draw and their detection is much faster and simpler. In our applet of gesture matching, matching all gestures are stored in a database file with respective actions, actions which the computer will perform against those gestures. Then, Then a newly drawn gesture by the user is compared with the database of gestures and the best matching gesture is selected by the process of correlation of points in the stroke. Capturing of a gesture is started when user opens his thumb and forefinger only. The pointer position is captured after every 20ms and stored as an array of points. A complete set of points is called the stroke. Then, before storing or matching the gesture, its size, spacing and center is normalized as described in the article of Handwriting Handw recognition at GameDev [16]. Drawn gesture size is different every ever time. Therefore at first step, we normalize its size so that in the matching process, gestures appear in equal size. To normalize a gesture, we find the height and width of bounding ng box of the gesture and divide by maximum of the two. The normal ormalizing process is shown in Fig. 11.

Fig.11 Normalization of size

As we capture the position of pointer after equal intervals of time (20ms), the spacing between captured points vary with the speed at which the gesture is drawn. Therefore, all captured points are not at equal intervals. So we require a gesture consisting of a fixed number of points (32 points in our program) for the matching process with other stored gestures. Number of points in the captured gesture is calculated and this gesture length is used in the normalizing process of space. Then, a gesture point is interpolated after every 1/31th of the original length. At the end, we get a gesture containing equal number of points as shown in Fig. 12.

(a)

(b)

(c)

(a)

(b)

(c)

Fig.12 Normalization of spacing between captured points (a) Quickly drawn gesture (b) Slowly drawn gesture (c) Normalized gesture

Gestures are also drawn at different places on the screen. For the matching process, we have to bring those at the same point of reference. Therefore, the gestures are brought at the center of screen before matching. First of all, we calculate arithmetic mean of the coordinates of the points in the gesture. This gives the center position of the gesture. Then, we calculate the distance of center of the gesture from center of the screen and bring the gesture at the center of screen by simple addition and subtraction of calculated distance from gesture coordinates as shown in Fig. 13.

Fig. 13 Centralization of gesture

At the end, we match gestures with previously stored gestures one by one. We calculate the dot product of recently captured gesture with all gestures of our database and the gesture giving the highest dot product value is selected as the best match. Then, on the basis of matching gesture, corresponding action takes place by the computer. VII. IMPLEMENTED GESTURES Pointing, click, zoom in, zoom out and window closing gestures are implemented as shown in Fig. 14. More gestures can be added according to the user requirements. Only one forefinger extended is considered as the pointing gesture shown in Fig. 14 (a). Showing both thumb and forefinger is the click gesture shown in Fig. 14 (b). For now, clockwise circle will be used to zoom in, anticlockwise gesture to zoom out and drawing an alpha (cross) will close the current window.

(e)

(f) Fig.14 Implemented gestures

Gesture recognition was tested on eighteen different people. Results of gesture recognition are shown in Table III. TABLE III Results of Gesture recognition

Method of gesture drawing Gesture drawn with mouse Gesture drawn with fingertip

Gesture recognition 95 % 55 %

A 95% gesture recognition, when gestures are drawn with mouse, shows the robustness of the gesture recognition algorithm, but the 55% recognition, when gestures are drawn with fingertips, shows that fingertips detection needs to be improved. Experiments showed that click gesture recognition is 70% which is affecting the recognition of other gestures. VIII. CONCLUSION AND FUTURE WORK We proposed a hand gesture based human computer interaction system that provides a very intuitive and natural way to interact with computer. The hand is first segmented using skin color information and tracked using ‘camshift’ tracker, then fingertips are located on the contour of the segmented hand and single gestures drawn from fingertips are recognized. For now pointing, click, zoom in, zoom out and window closing gestures have been defined to demonstrate application of the system. Encouraging results are produced under controlled lighting conditions and relatively simple background. In future, we are looking to improve the hand segmentation algorithm. The performance of fingertips detection and gesture recognition depends on the contour of

the hand extracted from the segmented hand region. Secondly, we want to include more gestures so that the complete interaction with computer can be achieved using only hand gestures. We would like to explore the use of confidence circles/ellipses around the fingertips to account for the shaking and other random motions of the fingertips or jitter in the system. REFERENCES [1]

[2] [3]

[4]

[5]

[6] [7]

[8]

[9]

[10] [11] [12] [13] [14] [15]

[16]

P. Mistry, P. Maes and L. Chang, “WUW - Wear Ur World - A Wearable Gestural Interface,” in Proc. CHI extended abstracts on Human factors in computing systems, 2009. T. Singh,” Finger Mouse,” COMS W4735 Project Report, 2007. L. Jin et al.” A novel Vision based Finger-Writing Character Recognition System,” in Proc.18th International Conference on Pattern Recognition, 2006, pp. 1104-1107. Y. Liu, X. Liu and Y. Jia,” Hand-Gesture Text Input for Wearable Computers,” in Proc.4th IEEE International Conference on Computer vision System, 2006, pp. 8-8. A. Sanghi et al.” A fingertip detection and tracking system as a virtual mouse, a signature input device and an application selector,” in Proc. Southeastcon IEEE, 2008, pp. 503-506. V. Vezhnevets, V. Sazonov and A. Andreeva,” A Survey on Pixel-Based Skin Color Detection Techniques,” in Proc.13th GRAPHICON, 2003. Q. Huynh-Thu, M. Meguro and M. Kaneko,” Skin-Color Extraction in Images with Complex Background and Varying Illumination ,” in Proc. 6th Workshop on Application of Computer Vision IEEE,2002, pp.280285. J. Y. Lee and S. I. Yoo,” An Elliptical Boundary Model for Skin Color Detection,” in Proc. The International Conference on Imaging Science, Systems and Technology, 2002. R. Hewitt (2007, Mar). Seeing with OpenCV. North Hollywood, CA [Online]. Available: http://www.cognotics.com/opencv/downloads/ camshift_wrapper/index.html G. Bradski and A. Kaebler, Learning OpenCV- Computer Vision with OpenCV Library. Sebastopol, CA: O’Reilly, 2008, pp. 337–341. Y. Hirobe et al.” Vision-based Input Interface for Mobile Devices with High-speed Fingertip tracking,” in Proc. ACM UIST,2009, pp.7-8. S. Malik,” Real-Time Hand tracking and Finger Tracking for Interaction,” CSC2503F Project Report, 2003. G. Bradski and A. Kaebler, Learning OpenCV- Computer Vision with OpenCV Library. Sebastopol, CA: O’Reilly, 2008, pp. 253-254. G. Bradski and A. Kaebler, Learning OpenCV- Computer Vision with OpenCV Library. Sebastopol, CA: O’Reilly, 2008, pp. 258-260. G. Panin, S. Klose and A. Knoll,” Real-Time Articulated Hand Detection and Pose Estimation,” in Proc. 5th International Symposium on Advances in Visual Computing,2009. O. Dopertchouk, Recognition of Handwritten Gestures, GameDev, Jan. 2004. [Online]. Available: http://www.gamedev.net/page/resources/_/ reference/programming/sweet-snippets/recognition-of-handwrittengestures-r2039 [Accessed: 06 May. 2011].

Suggest Documents