Feature Extraction and Matching Using Scale ...

43 downloads 0 Views 3MB Size Report
percentage accuracy has been determined. The feature extraction and matching has been implemented over Indian. Sign Language (ISL) biometrics. Keywords.
International Journal on Emerging Trends in Technology (IJETT) ISSN: 2455 - 0124 (ONLINE) | 2350 - 0808 (PRINT) | (IF: 0.456) Volume 3 | Issue 2 | July - 2016 | Special Issue www.ijett.in

Feature Extraction and Matching Using Scale Invariant Feature Transform For Indian Sign Language Sandeep B. Patil1, G.R. Sinha2 [email protected], [email protected] Department of Electronics & Telecommunication1, 2 Faculty of Engineering and Technology of Shri Sahankaracharya Technical Campus Bhilai. (C.G)1, 2

ABSTRACT Lack of awareness for deaf people leads to increased communication gap, misconceptions and superstitions among deaf and hard hearing community. This paper describes a method for extracting distinctive features from an image and performs reliable matching between two images having different scales. We have shown that the features are invariant to scale, illumination, clutters and provide robust matching for recognition. The acquired features are highly distinctive such that a single feature can be correctly matched with high probability. The experimental result shows the time constraint for extracting the SIFT key features for original and scale changed image. These extracted features are then matched and percentage accuracy has been determined. The feature extraction and matching has been implemented over Indian Sign Language (ISL) biometrics.

Keywords Indian Sign Language, Hand gesture, scale invariant features, Feature matching.

1. INTRODUCTION Image matching is very important step in any computer vision and biometrics task. This paper has attempted the detection of image features that make them suitable for matching different images of same type. It also shows that the acquired features are highly stable and invariant to illumination, rotation and scaling. The emphasis is also in acquiring stable features with different scaling factor which makes them suitable for reliable matching [1]. The scale invariant feature transform [2][3] has been used for feature extraction. The important feature of this approach is that it generates a large number of features covering the image densely. This dense quantity of features is adequate for

object recognition under cluttered background. For

correct identification, at least three features should be correctly matched from huge amount of features. For image matching purpose, shift invariant feature transform (SIFT) features were extracted from a reference image and stored in database. The input image used for matching is compared with each features present in the database and candidate is found matched using Euclidian distance method.

2. LITERATURE REVIEW A numerous research works on ISL biometrics is being carried out and some of the researchers have already contributed in this area. Study of sign language biometrics and also ISL biometrics was made and below is a report of literature survey: Bhuyan [9] et al. (2011) proposed a novel approach for hand pose recognition for analyzing the textures and key geometrical features of the hand. A skeletal hand model was constructed to analyze the abduction/adduction movements of the fingers and subsequently, texture analysis was performed to consider some inflexive finger movements. The feature extraction technique is more complex because the abduction and adduction angle of the fingers and internal variations for particular gestures. Ghotkar [17] et al. (2012) introduced a hand gesture recognition system to recognize the alphabets of ISL. The system consists of 4 modules: real time hand tracking, hand segmentation, feature extraction and gesture recognition. Genetic algorithm was used for gesture recognition. Computational time required for the system is more, since it has to process through 4 different modules. Rajam [12] et al. (2011) developed a Sign Language biometrics system for one of the south Indian languages. The system describes a set of 32 signs, each representing the binary ‘UP’ & ‘DOWN’ positions of the five fingers of right hand palm. This system was applied to single user both in

ICSTSD 2016 | 5012

International Journal on Emerging Trends in Technology (IJETT) ISSN: 2455 - 0124 (ONLINE) | 2350 - 0808 (PRINT) | (IF: 0.456) Volume 3 | Issue 2 | July - 2016 | Special Issue www.ijett.in

training and testing phase. The above mentioned system is applicable to Tamil text only and find limited to five fingers of single hand with same background. Yang quan [28] (2011) introduced a novel recognition method of sign language using the vision-based multi-features classifier. This can exclude some skin-like object and tracked the moving recognizes hand more precisely from the sign language video sequence. The system developed dynamic sign language appearance modeling first, and then classification method of SVMs for recognition has been used. The experiment was carried out over 30 groups of the Chinese manual alphabet images and the results proved that this appearance modeling method is simple, efficient, and effective for characterizing hand gestures. The overall result shows that using linear kernel function the best recognition rate of 99.7% of letter ’F’ has been achieved. The system suffers with limitations in real time with dynamic hand gesture. Arulkarthick [6] et al. (2012) presented a real-time system for sign language recognition using hand gestures. After detecting hands postures, hand gestures are recognized based on features of hand that are extracted using the Haar transform. K-means clustering algorithm was used to reduce the number of extracted features which reduced the computational complexity. This method was limited to small database, and for large database the performance of AdaBoost algorithm is not satisfactory. Alon [7] et al. (2009) introduced a unified framework that simultaneously performed spatial segmentation, temporal segmentation, and recognition. The framework carried both the information’s that is bottom-up and top-down. A gesture has been recognized even when the hand location is highly ambiguous and gesture’s begins and ends information are unavailable. The performance of this approach was evaluated on two challenging applications: recognition of hand-signed digits gestured by users wearing short-sleeved shirts, and retrieval of occurrences of signs of interest from a video database containing continuous, unsegmented signing of American Sign Language (ASL). This system is limited to five signs of ASL. The algorithm has to undergo through three major components that is feature extraction, spatio- temperol matching and intermodal competition that requires large processing time.

2.2 Feature Extraction For various object recognition problems, it is not appropriate to detect the features using stable scales because the scale of the object is unknown. When images are matched against different frequency components, the images with stable scale features ate inadequate to provide reliable matching. To overcome this problem, it is better to extract features that are stable in both location and scale. The flow diagram for feature extraction is shown in Figure 2. It shows that the SIFT algorithm [1]-[5] is performed over input image which extract the stable features. Once the features are extracted, the image undergoes scale variation by 2; after applying the SIFT algorithm the new features are extracted. This paper also computes the time required to extract the various features and computes the matched key points between two images.

2.1 Image Acquisition

Figure 2. Feature extraction by varying scale.

Indian sign language gestures were captured by a high resolution digital camera (14 Mega pixel), as shown in Figure 1. The database consists of 26 gestures of 10 different persons from various age groups. Figure 1 shows the images of class 1 only.

2.3 Scale Space Extreme Detection Recent research over scale selection is described by Lindeberg [4], proposed the use of Laplacian of Gaussian function. This work was further carried over by Lowe [2], developed a set of sub-octave Difference of Gaussian (DoG) filter to compute a sub-pixel scale- space location. In this stage, the image is first convolved with Gaussian blur at various scales, the difference of successive blurred images were taken and produce Difference of Gaussian filters. This stage further detects the key point as maxima/minima of Difference of Gaussian that occurs at multiple scale. Let D(x, y, σ) is a DoG image and is given by

Figure 1. Indian sign language gestures of class 1

ICSTSD 2016 | 5013

(1)

International Journal on Emerging Trends in Technology (IJETT) ISSN: 2455 - 0124 (ONLINE) | 2350 - 0808 (PRINT) | (IF: 0.456) Volume 3 | Issue 2 | July - 2016 | Special Issue www.ijett.in

where is the original image I(x, y) convolved with the Gaussian blur G(x, y, Kσ) at scale Kσ i.e (2) where * is the convolution operator, G(x, y, σ) is a variablescale Gaussian and I(x, y) is the input image. The DoG image between scale Ki and Kj is the only difference of the Gaussian-blurred images at scale Ki and K j . The convolved images are grouped by octave and the value of Ki is assumed so that we get a fixed number of convolved images per octave. Further, DoG images were taken from adjacent Gaussian blurred images per octave. The DoG for first octave is shown in Figure 3. The flow diagram for the generation of several octaves is shown in Figure 4. Figure 5. The cross pixel comparison with its 8 neighbor in same scale and 9 neighbors with other scale.

2.4 Key Point Localization The acquired candidate pixel has to undergo through a Taylor’s expansion of the scale-space function. Thus D(x, y, σ) is shifted, so the origin is at the sample point [2]-[7].

(3) where D and its derivatives are evaluated at the candidate keypoint and x=(x,y,) is the offset from this point.

Figure 3. DOG for first octave.

The location of the extremum X cap is determined by taking the derivative of this function with respect to x and setteing it to zero.

2.4.1 Eleminating Low contrast point The low contrast keypoint can be discarded if the value of the second order Taylor’s expansion D(X) is computed at the offset X cap. However, if the value is less than 0.03, the candidate keypoint is discarded.

2.4.2 Corner/Edge Thresholding A poorely defined peak in the DoG function will have a large principal curvature across the edge but a small one in the perpendicular direction. The principal curvature could be computed from a 2 X 2 Hessian matrix H and is given by [2][4]. Figure 4. DOG generation for several octaves.

After obtaining DoG image, the key points are identified by comparing each pixel in the DoG image to its neighbor at the same scale and nine corresponding neighboring pixels in each of the neighboring scale. The pixel whose value is minimum or maximum among all compared pixel, the pixel is selected as a candidate pixel. The process of comparison is shown in Figure 5.

(4) The derivatives are estimated by taking differences of neighboring sample points. The eigen value of H are proportional to the principal curvatures of D. Let  br the eigen value with the largest magnitude and  be the small one.

(5)

ICSTSD 2016 | 5014

International Journal on Emerging Trends in Technology (IJETT) ISSN: 2455 - 0124 (ONLINE) | 2350 - 0808 (PRINT) | (IF: 0.456) Volume 3 | Issue 2 | July - 2016 | Special Issue www.ijett.in

(6) Let r be the ratio between the largest magnitudeeigenvalue and the smaller one, so that = r then,

(7) The quantity

is at a minimum when the two

eigenvalues are equal and it increases with r. Therefore to check that the ratio of principal curvatures is below some threshold, r, we only need to check

(8)

2.5 Orientation Assignment Image matching and object recognition algorithms generally deal with plane image rotation and not scale changes. This can be done by estimating a dominant orientation at each detected key point. The best possible technique describes by Lowe (2004) is to look at the histogram of orientations computed around the key point. The 36-bin histogram of edge orientation was computed weighted by both gradient magnitude and Gaussian distance to the centre.

Figure 6.Orientation assignment The point shown in Figure 6 in the middle is the key point candidate, the orientation of the point in the square area around this point are precomputed using pixel differences. Each bin in the histogram holds 10 degree, covering the whole 360 degree with 36 bins in it. The value of each bin holds the magnitude sum from all the points precomputed within that orientation. The peak within 80 % of the global maximum is more accurate orientation estimation. The Gaussiansmoothed image L(x,y,σ) at the key point’s scale σ is taken to perform all computations in a scale invariant manner. Let sample image L(x,y) at scale σ, the gradient magnitude m(x,y) and orientation θ(x,y) be pre-computed using pixel difference. Phase assigned orientation to each key point is based on local image gradient directions. Gaussian smoothed image L(x, y, ) at the key point scale  is used to perform all computations.

2.6 Key Point Descriptor Now, SIFT features are formed by computing the gradient at each pixel in a 16 x 16 window around the detected key point. The gradient magnitude is down weighted by a Gaussian falloff function (shown by blue circles) in order to reduce the influence of gradient far from the centre. In each 4x4 quadrant, a gradient orientation histogram is formed by adding the weighted gradient value to one of 8 orientation histogram bins. To reduce effect of location and dominant orientations misestimation, each of the original 256 weighted gradient magnitude is softly added to 2x2x2 histogram bins using trilinear interpolation. This results 4x4x8=128 nonnegative values from a row version of SIFT descriptor vector.

(9)

(10)

(a) Image gradients

(b) Key point descriptor

Figure 7. Extracting key point descriptor

ICSTSD 2016 | 5015

International Journal on Emerging Trends in Technology (IJETT) ISSN: 2455 - 0124 (ONLINE) | 2350 - 0808 (PRINT) | (IF: 0.456) Volume 3 | Issue 2 | July - 2016 | Special Issue www.ijett.in

Image source Lowe (2004) The final key point’s descriptors for first five images are shown in Figure 8.

(a) Gesture A.

Detected key points (73)

Once the key point descriptors are detected, the original image has to undergo scale change by 2; the detection of new key points descriptors after scale changing would be used for matching with original key points descriptors. Figure 9 shows the number of key point’s descriptors after scale changing. It also shows how many key points exactly matched with the key points from original image. The count is the number of key points common in two images and kp is the maximum number of key points located in an image. The percentage matching mp can be calculated by:

(11)

(b) Gesture B

Detected key points (71) Algorithm for comparison between two images key points tic count=0;

(c) Gesture C

Detected key points (57)

for i=1:2:length(kpl) for j=1:2:length(kpl2) if (kpl(i)==kpl2(j)) && (kpl(i+1)==kpl2(j+1)) count=count+1;

(d) Gesture D

Detected key points (94) break; end end

(e) Gesture E

Detected key points (108)

Figure8. Original images for first five gestures and its corresponding key points

end mp=(count/length(kp))*100;

(c) (a)

Gesture A

(b)

Gesture B

Gesture C

Detected key points (75)

Detected key points (109)

Detected key points (113)

Figure 9. Scale change by 2 for first four gesture and its corresponding key points.

3. Performance Evaluation This paper also measures the time computations for various phases of SIFT algorithm. Firstly, we compute the time required to extract the features from reference image, the average time for feature extraction has been 3 seconds. The reference image undergoes scale variation by 2, computing

(d) Gesture D

Detected key points (142)

the time required for SIFT features; the computation time for scale change image is again 3 seconds. Now, we compute the time for reliable matching which is approximately 0.2 msec. Finally, the matching percentage between two images by key point location is shown in Table 1.

ICSTSD 2016 | 5016

Table 1. Time computation for various phases

International Journal on Emerging Trends in Technology (IJETT) ISSN: 2455 - 0124 (ONLINE) | 2350 - 0808 (PRINT) | (IF: 0.456) Volume 3 | Issue 2 | July - 2016 | Special Issue www.ijett.in Z Han d Gest ure

Time for calculatin g the SIFT key for 1st image (seconds)

Scal e chan ge by 2

Time for calculating the SIFT key for 2nd image (seconds)

Time taken for calculati ng the matchin g percenta ge (seconds )

Matching percentag es between two images by key point location

2.995073

2

3.108791

0.000295

67.91044

It has been seen from Table 1 that average 70 percent of key points are matched between original and scale changed image. Further the time taken for matching percentage is around 0.2 m sec.

4. Conclusion The hand gesture is the most expressive way of communication between deaf communities. Hand gesture recognition is the area of interest for most of the researcher. This paper has used SIFT algorithm and stable features were extracted which are invariant to scale, clutter and rotation. This paper shows how to extract the stable features from reference and scale changed image, compare these features and compute the time required for implementing each phase. The reliable matching has been achieved in 0.2 msec.

A

2.989362

2

2.966718

0.000232

66.97247

B

2.980283

2

3.106961

0.000258

62.83185

C

2.981861

2

3.069738

0.000167

76.00000

D

3.002684

2

3.314499

0.000343

66.19718

E

3.014743

2

3.144527

0.000331

71.05263

REFERENCES

F

3.014216

2

3.095023

0.000279

72.51908

G

3.008967

2

3.152557

0.000381

67.92452

H

3.085294

2

3.184808

0.000828

59.84848

I

3.000123

2

3.077228

0.000197

60.71428

J

2.951115

2

3.075485

0.000163

73.97260

K

2.986478

2

3.093422

0.000278

64.56692

L

2.964468

2

3.085828

0.000204

67.70833

M

2.974702

2

3.123334

0.000192

69.90291

N

2.980891

2

3.099554

0.000241

64.22018

O

2.938539

2

3.059080

0.000148

74.19354

P

2.993993

2

3.092934

0.000285

72.35772

Q

2.996639

2

3.086430

0.000282

64.51612

R

2.999799

2

3.063274

0.000222

67.32673 3

S

2.991199

2

3.050137

0.000234

69.44444

T

2.979422

2

3.066865

0.000227

65.71428

U

2.968540

2

3.039240

0.000161

62.31884

[1] Sandeep B. Patil and G.R. Sinha, 2015, “Intensity Based distinctive feature extraction and matching using Scale invariant feature transform for Indian Sign Language” 17th International Conference on Mathematical Methods, Computational Techniques and Intelligent Systems (MAMECTIS15) held in Tenerife, Canary Islands, Spain, January 10-12, 2015. Pp-245-255. The ISI Journals (with Impact Factor from Thomson Reuters), ISBN: 978-1-61804-281-1 [2] David G. Lowe, "Distinctive image features from scaleinvariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110, retreived 3-112011from http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf. [3] David G. Lowe, "Object recognition from local scaleinvariant features," International Conference on Computer Vision, Corfu, Greece (September 1999), pp. 1150-1157 retreived from : http://www.cs.ubc.ca/~lowe/papers/iccv99.pdf [4] Lindeberg, Tony, “ Feature Detection with Automatic Scale Selection” International Journal of Computer Vision, vol 30, number 2, pp 77--116, 1998 [5] G.R. Sinha ans Sandeep B. Patil, “Biometric: Concept and Applications” Wiley India Pvt. Ltd. ISBN-13: 9788126538652, ISBN-10:8126538651, 2013. [6] Arulkarthick V.J, Sangeetha D. and Umamaheswari S, 2012, Sign Language Recognition using K-Means Clustered Haar-Like Features and a Stochastic Context Free Grammar, European Journal of Scientific Research, 78(1):74-84. [7] Alon J. and Athitsos V, 2009, A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation, IEEE Transactions on Pattern analysis and Medical Intelligence, 31(9):1685-1699.

V

2.958453

2

3.060010

0.000191

73.86363

W

2.969046

2

3.067797

0.000182

65.16853

X

2.968251

2

3.042073

0.000148

68.49315

Y

3.104574

2

3.059575

0.000280

68.25396

[8] Ulrike Zeshan, Madan M. Vasishta and Meher Sethna, “Implementation of Indian Sign Language in Educational settings” Asia Pacific Disability Rehabilitation Journal,Vol.16, No.1, pp.16-40, 2005 [9] Bhosekar A, Kadam K, Ganu R. & Joshi S.D, 2012, American Sign Language Interpreter, IEEE Fourth

ICSTSD 2016 | 5017

International Journal on Emerging Trends in Technology (IJETT) ISSN: 2455 - 0124 (ONLINE) | 2350 - 0808 (PRINT) | (IF: 0.456) Volume 3 | Issue 2 | July - 2016 | Special Issue www.ijett.in

[10]

[11] [12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

International Conference on Technology for Education, 6(12): 157-159 Bhuyan M.K, Kar M.K & Debanga R.N, 2011, Hand Pose Identification from Monocular Image for Sign Language Recognition, IEEE International Conference on Signal and Image Processing Applications (ICSIPA2011), 3(12):378-383 Bharti P, Kumar D and Singh S, 2012, Sign Language to Number by Neural Network, International Journal of Computer Applications, 40(10): 38-45. Balakrishnan G, Rajam P.S, 2011, Real Time Indian Sign Language Recognition System to aid Deaf-dumb People, IEEE International Conference on Computing Communication and Networking Technologies (ICCCNT) :737-742. Chiu Y.H, Cheng C.J, Su H.W, Wu C.H, and 2007, Joint Optimization of Word Alignment and Epenthesis Generation for Chinese to Taiwanese Sign Synthesis, IEEE Transaction on Pattern Analysis and Machine Intelligence, 29(1):28-39. Chakraborty P, Mondal S, Nandy A, Prasad J.S , 2010, Recognizing & Interpreting Indian Sign Language Gesture for Human Robot Interaction, Int’l Conf. on Computer & Communication Technology |ICCCT’10|,2(10), 52(11): 712-717. Chen X and Zhou Y, 2010, Adaptive Sign language recognition with exemplar extraction and MAP/IVFS, IEEE signal processing letter, 17(3):297-300 Chung H. W, Chiu Y. H and Chi-Shiang Guo, 2004, Text Generation From Taiwanese Sign Language Using a PST-Based Language Model for Augmentative Communication, IEEE Transcation On Neural System And Rehabilitation Engineering, 12(4):441-454. Debevc M, Kosec P & Rotovnik M, 2009, Accessible Multimodal web pages with Sign Language Translations for Deaf and Hard of Hearing Users, 20th International Workshop on Database and Expert Systems Application 3(9): 279-283. Dasgupta T, Basu A, Mitra P, 2009, English to Indian sign language machine translation system: A structure transfer framework, international conference on artificial intelligence (IICAI):118-124. Ghotkar A.S, Hadap M, Khatal R, Khupase S, 2012, Hand Gesture Recognition for Indian Sign Language, International Conference on Computer Communication and Informatics (ICCCI -2012), Coimbatore, INDIA, 2(4):9-12. Gaolin F ,Wen G and Zhao D, 2007, Large-Vocabulary Continuous Sign Language Recognition Based on Transition-Movement Models, IEEE Transaction On System, man, And Cybernetics, 37(1):1-9. Habili N and Moini A, 2004, Segmentation of the Face and Hands in Sign Language Video Sequences Using Color and Motion Cues, IEEE Transaction On circuit and System for Video Technology, 14(8):1086-1097. Huang Q, Kong D, Li J, Sun Y, Wang L, Zhang X, 2012, Synthesis of Chinese Sign Language Prosody Based on Head, International Conference on Computer Science and Service System, 1(8):1860-1864. Ittisam P, Toadithep N, 2010, 3D Animation Editor and Display Sign Language System Case Study: Thai Sign Language, 2(3):633-637. Patil S.B and Sinha G.R, 2012, Real Time Handwritten Marathi Numerals Recognition Using Neural Network, International Journal of Information Technology and Computer Science, 4(12): 76-81.

[25] Patil S.B, Sinha G.R and Thakur K, 2012, Isolated Handwritten Devnagri Character Recognition Using Fourier Descriptor and HMM in International Journal of Pure and Applied Sciences and Technology (IJPAST), 8(1): 69-74. [26] Patil S.B. and Sinha G.R, 2012, Off-line mixed Devnagri numeral recognition using Artificial Neural Network, Advance in computational Research, Bioinfo journal, 4(1): 38-41. [27] Patil S.B, Sinha G.R and Patil V.S, 2011, Isolated Handwritten Devnagri Numerals Recogntion Using HMM, IEEE Second International conference on Emerging Applications of Information technology, Kolkota, ISBN: 978-1-4244-9683-9:185-189. [28] Quan Y, 2010, Chinese Sign Language Recognition Based On Video Sequence Appearance Modeling, 5th IEEE Conference on Industrial Electronics and Applicationsis, ISSN 978-1-4244-5046-6/10:1537-1542

ICSTSD 2016 | 5018

Suggest Documents