Hand Initialization and Tracking Using a Modified KLT ... - IEEE Xplore

1 downloads 0 Views 1MB Size Report
Abstract—This paper presents a new algorithm for tracking the hand during palpation in a breast self-examination video capture using a modified KLT feature ...
7th IEEE International Conference Humanoid, Nanotechnology, Information Technology Communication and Control, Environment and Management (HNICEM) The Institute of Electrical and Electronics Engineers Inc. (IEEE) – Philippine Section 12-16 November 2013 Hotel Centro, Puerto Princesa, Palawan, Philippines

Hand Initialization and Tracking Using a Modified KLT Tracker for a Computer Vision-Based Breast Self-Examination System Rey Anthony A. Masilang, Melvin K. Cabatuan, Elmer P. Dadios De La Salle University Manila, Philippines [email protected], [email protected], [email protected] Abstract—This paper presents a new algorithm for tracking the hand during palpation in a breast self-examination video capture using a modified KLT feature tracker. This is implemented primarily using Shi-Tomasi corner detection and Lucas-Kanade optical flow. A novel hand initialization technique was developed using Shi-Tomasi corner detection, outlier elimination, ellipse fitting, and target estimation in order to locate specifically the finger pads. Then, continuous hand tracking is achieved using Lucas-Kanade optical flow and a novel evaluation and screening of displacement vectors. A dataset of 14 video sequences was used to test the performance of the proposed algorithm. Experiments revealed efficient tracking capability of the algorithm with an overall F-score of 94.61%. Index Terms— Breast self-examination, computer vision-based BSE, corner detection, optical flow, hand tracking, KLT tracker

I. INTRODUCTION Breast cancer is one of the most prominent causes of death among women aged 15 years and above. Breast cancer incidence across the globe increased from 641,000 cases in 1980 to 1,643,000 cases in 2010. This is equivalent to an annual rate of increase of 3.1%. Around 425,000 women died due to breast cancer in 2010 alone. 68,000 of them were women at the age of around 15-49 years in developing countries [1]. The curability of breast cancer is dependent on many factors. One of these factors is how early the disease is detected, diagnosed, and classified. Various imaging tests and procedures are used to detect breast cancer. However, most of these procedures are costly and not readily available to people especially to those in developing countries [2]. Breast selfexamination (BSE) is a method which can be performed by women which does not require special procedures and personnel. Though not all breast cancer cases can be found this way, BSE is still an important tool in the early detection of breast cancer when there is still high capability of being cured [3]. There has been some debate over how valuable BSE is in the early detection of breast cancer and improved survival. The apparent beneficial effects of BSE are, to an extent, negated by poor proficiency of women who practice BSE and their lack of knowledge regarding the technique [4]. This can be attributed 978-1-4799-4020-2/14/$31.00 ©2014 IEEE

to lack of proper training and guidance. Therefore a computer vision-based system for BSE will be very useful in training and guiding women towards efficient performance of BSE [5][6][7]. In this paper, a new algorithm for initializing and tracking the hand of the user in a BSE video capture is presented. The algorithm uses a novel method based on the Kanade-LucasTomasi feature tracker. This hand initialization and tracking algorithm will be useful for a computer vision-based breast self-examination system. Its primary purpose is to track the hand activity during the BSE procedure and analyze motion trajectories of the hand to evaluate whether the user is performing proper palpation. Ultimately, this will be used to determine whether the user has thoroughly examined all the necessary sections in the breast area. II. HAND TRACKING IN BSE Tracking the hand in breast self-examination using computer vision is a very challenging task. Currently existing methods specifically developed for this application uses background subtraction and special color markers [8][9]. The former uses absolute image subtraction to determine the pixels corresponding to the hand [8]. This method however treats everything else aside from the hand as background including the breast area. The breast however can also be highly deformed in a BSE video sequence. This limits the applicability of such method. The latter method uses colored finger nails or colored tapes on the fingers to track the hand [9]. Using color filters, the location of the fingers are determined. This however strictly requires the user to wear these color markers which is not user-friendly. The main areas of concern regarding BSE hand tracking are the color similarity between the hand and the breast area as well as the highly deformable hand which has 27 degrees of freedom and also subject to abrupt changes in orientation [10]. Currently used object tracking algorithms, specifically CAMshift and TLD, is tested on BSE hand tracking. Based from the results, these methods perform poorly in this specific application due to the two main areas of concern specified above.

7th IEEE International Conference Humanoid, Nanotechnology, Information Technology Communication and Control, Environment and Management (HNICEM) The Institute of Electrical and Electronics Engineers Inc. (IEEE) – Philippine Section 12-16 November 2013 Hotel Centro, Puerto Princesa, Palawan, Philippines

Fig. 1. General Block Diagram of the System

III. PROPOSED TRACKING METHOD Corner features or keypoints are the primary features utilized to detect and locate the finger pads, as well as to track the hand on the succeeding frames. This method solves the two major problems of BSE hand tracking which are the high deformability of the hand and its similarity in color with the breast background. The proposed method is divided into two main stages: initialization and tracking. The entire algorithm is summarized in Fig. 1 and explained in the succeeding sections. A. Preprocessing The input to the system is a BSE video. The frames are extracted first and each frame undergoes a simple preprocessing stage. Each frame is scaled down to a 640x480 resolution. This resolution is selected to optimize the processing speed of the algorithm without negatively affecting the performance of the corner detection process. Each frame is converted to grayscale. Color information is discarded due to similarity between the hand and the breast area. Then, a Gaussian filter is used to smoothen out each frame. This is done to minimize the detection of irrelevant features. B. Hand Initialization The first step to tracking the hand is to detect its presence within the breast region. Shi-Tomasi corner detection is used to detect strong corner features within the breast region as shown in Fig. 2 (upper left). The detected corners constitute a feature set. Outlying corners are eliminated using a filtering method. First, the centroid and the standard deviations on both horizontal and vertical axes of the feature set are calculated. If either the horizontal or vertical distance of a feature from the

Fig. 2. Visual representation of the Hand Initialization stage

centroid exceeds the corresponding standard deviation, that feature is identified as an outlier and is filtered out. Figure 2 (upper right) shows the eliminated features as red dots and the remaining features as blue dots. The number of remaining features is compared to a predefined threshold to determine if the hand is present or not. If the size of the feature set is too small, it implies that the original features were mostly scattered which indicates that the hand is not present. In such case, the detection step needs to be repeated until the hand is detected. Otherwise, it implies that the hand is present because most features detected are located along the area of the hand.

7th IEEE International Conference Humanoid, Nanotechnology, Information Technology Communication and Control, Environment and Management (HNICEM) The Institute of Electrical and Electronics Engineers Inc. (IEEE) – Philippine Section 12-16 November 2013 Hotel Centro, Puerto Princesa, Palawan, Philippines

The remaining features are drawn on a virtual feature map and an ellipse that fits the feature set best is calculated based on the function developed in [11] as shown in Fig. 2 (lower left). The location of the finger pads is estimated as lying along the major axis of the best fitting ellipse, with a distance of 25% of the length of the major axis from either one of the two endpoints as shown in Fig. 2 (lower right). If the left hand is used, the right endpoint of the major axis is used as the reference endpoint. Otherwise, the left endpoint is selected. C. Hand Tracking For the tracking stage, a unique sparse optical flow is used. Sparse optical flow processes only a set of pixels from the whole image. In this case, the corners extracted previously in the hand initialization stage are used to calculate the optical flow and estimate hand motion. But before calculating the optical flow, the current feature set is filtered again first. This is done by defining a neighborhood of features through a circular section of the image centered at the exact coordinates of the finger pads as shown in Fig. 3 (upper left). Features outside this neighborhood are disregarded. After this, the optical flow of the two consecutive frames is calculated using the Lucas-Kanade method. The goal here is to estimate the motion of the hand from one frame to the next. The displacement of the individual pixels of the hand is a good candidate to represent the actual displacement of the hand. But for this to be accurate, only the most reliable displacement vectors must be selected. Hence, some displacement vectors must be filtered out. Examples of invalid displacement vectors are very large vectors and very small vectors. Displacement vectors which are very large, shown in Fig. 3 (lower left) as red arrows, in comparison to the rest of the vectors most likely correspond to erroneous optical flow estimation. On the other hand, displacement vectors which are very small, shown in Fig. 3 (lower right) as red arrows, in comparison to the rest are relatively non-moving features which are most likely not part of the hand. Both types of vectors are removed. At this point, ideally, the remaining vectors and features are those which truly represent the hand and its motion between the two consecutive frames. The displacement of the hand is taken as the aggregate of the remaining displacement vectors. This is calculated simply as the average of these vectors as shown in Fig. 3 (upper right). D. Tracking Reinforcement On the way to tracking the hand, the feature set undergoes several filtering steps. Thus, lots of features are being removed. Also, these features may tend to flock very close with each other and thus would not sufficiently represent the hand. Therefore, in order to keep the tracking continuous, the feature set must be updated and replenished accordingly. First, features which are very close to other features are removed. Then, additional features, detected similarly using Shi-Tomasi method, are automatically appended. This new feature set is used in the next iteration of the hand tracking stage. The

Fig. 3. Visual representation of the Hand Tracking stage

Fig. 4. Sample frames from the 14 test video sequences showing the tracking results (red circle) using the proposed algorithm

tracking stage is repeated over and over again until the algorithm loses track of the hand. In this case, the initialization stage must be repeated. IV. EXPERIMENT RESULTS The proposed algorithm is tested on 14 video sequences of breast palpation. These are taken from the documented videos of 4 different women performing BSE by the CHEDPHERNET Breast Cancer Research Group. Figure 4 shows samples of tracking results shown as red circles. For each video sequence, the ground truth coordinates of the hand per frame are manually annotated. The performance of the algorithm is based on the comparison of the tracking results to the ground truth. Performance is measured in terms of the F-score and deviation. These two metrics are computed using the following equations.

7th IEEE International Conference Humanoid, Nanotechnology, Information Technology Communication and Control, Environment and Management (HNICEM) The Institute of Electrical and Electronics Engineers Inc. (IEEE) – Philippine Section 12-16 November 2013 Hotel Centro, Puerto Princesa, Palawan, Philippines

Fig. 5. Performance in terms of precision of the proposed algorithm using the 14 test video sequences

Fig. 7. Performance in terms of F-score of the proposed algorithm using the 14 test video sequences

Fig. 6. Performance in terms of recall of the proposed algorithm using the 14 test video sequences

Fig. 8. Performance in terms of deviation of the proposed algorithm using the 14 test video sequences

F

precision

(1)

recall

(2) ∗

2∗

deviation

1

(3)

∑∈

, |

|

(4)

F-score is a measure of tracking accuracy which considers both precision and recall of the tracker while deviation is the distance of the tracking result from the ground truth [11]. Equations 1, 2, and 3 show how the F-score is computed. Ntp correspond to the number of true positives, Nfp for the false positives, and Nfn for the false negatives. Equation 4 shows how deviation is computed. Ms denotes the set of frames where the hand is correctly tracked. For this problem, the criterion used for correct tracking is a maximum error distance of 50 pixels from the ground truth. The proposed algorithm’s performance in terms of precision, recall, F-score, and deviation for the 14 video sequences is summarized in the figures above. As shown in Fig. 5, the proposed algorithm’s precision is consistently high with a range of values from 84% up to 100%. A high precision implies that the tracking results are mostly correct and accurate. As shown in Fig. 6, the algorithm’s recall for the 14 video sequences is generally high with a range of values from

79% up to 99%.A high recall implies that the algorithm was able to accurately track the hand in majority of the frames of each video sequence. With both precision and recall generally high for each of the test video sequences, the F-score for each video are also consistently high with a range of values from 81% up to 99% as shown in Fig.7. Lastly, as shown in Fig. 8, the average deviation for each video sequence is well below 50-pixel distance criterion for successful tracking. This means that the tracked location of the hand per frame is relatively near the true location for majority of the frames. Overall, the proposed algorithm has a precision of 96.53% and a recall of 92.77%. The average deviation for the entire test dataset is approximately 24.33 pixels. While this might seem like a significant amount of deviation, the high recall performance of the algorithm proves that it is capable of continuous tracking despite the constant presence of deviation from the ground truth. Taking precision and recall into consideration, the proposed tracking algorithm yielded an overall F-score of 94.61%. This proves that the proposed method is capable of continuous and efficient tracking of the hand during breast self-examination. V. CONCLUSION AND FUTURE WORK In this paper, a new algorithm for initializing and tracking the hand in a video capture of breast self-examination was presented. Using a modified KLT tracker implemented using a combination of Shi-Tomasi corner detection, Lucas-Kanade optical flow, and a novel approach of selecting and filtering

7th IEEE International Conference Humanoid, Nanotechnology, Information Technology Communication and Control, Environment and Management (HNICEM) The Institute of Electrical and Electronics Engineers Inc. (IEEE) – Philippine Section 12-16 November 2013 Hotel Centro, Puerto Princesa, Palawan, Philippines

features, the proposed algorithm is proven to be capable of efficient hand tracking with an overall F-score of 94.61%. The developed hand initialization and tracking algorithm paves the way for future work on supervising the breast selfexamination using computer vision. An example application to BSE supervision of this algorithm is for evaluating which region of the breast is currently being examined by the user. Another possible application is for analyzing hand motions in order to evaluate whether the user is examining her breasts in the correct manner. Ultimately, this algorithm can be used to extend the use of computer vision in BSE from passive supervision to active supervision characterized by a capability to provide corrective feedback in order to guide the user appropriately. VI. ACKNOWLEDGMENT The authors would like to thank the Engineering Research and Development for Technology (ERDT) Consortium for their funding of this research. VII. REFERENCES [1] M.H. Forouzanfar, K.J. Foreman, A.M. Delossantos, R. Lozano, A.D. Lopez, C.J.L. Murray, M. Naghavi, “Breast and cervical cancer in 187 countries between 1980 and 2010: a systematic analysis,” The Lancet, Volume 378, Issue 9801, pp. 1461 1484, 22, 2011. [2] A. B. Nover, S. Jagtap, W. Anjum, H. Yegingil, W. Y. Shih, W. Shih, and A. D. Brooks., “Modern breast cancer detection: a technological review,” Int’l Journal of Biomedical Imaging, Vol. 2009, 2009. [3] C.M. Huguley Jr. MD, R.L. Brown MD, “The value of breast self-examination," Cancer, vol. 47, no. 5, pp. 989-995, 1981.

[4] H.L. Howe, "Proficiency in performing breast selfexamination," Patient Counselling and Health Education, vol. 2, no. 4, pp. 151-153, 1980. [5] A. Oikonomou, S. Amin, R.N.G. Naguib, A. Todman, H. AlOmishy, “Breast self examination training through the use of multimedia: a benchmark multimedia development methodology for biomedical applications,” Proc. 23 rd IEEE-EMBSS, Istanbul, Turkey, 2001. [6] A. Oikonomou, S.A. Amin, R.N.G. Naguib, A. Todman, H. AlOmishy, “Breast self examination training through the use of multimedia: a comparison between multimedia development approaches,” Proc. of IEEE-EMBSS UK & Rol PG Conference in Biomedical Engineering and Medical Physics, 2002. [7] A. Oikonomou, S.A. Amin, R.N.G. Naguib, A. Todman, H. AlOmishy, “A prototype multimedia application for breast self examination training,” Proc. Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society and Engineering in Medicine and Biology, 2002. [8] Y. Hu, R.N.G. Naguib, A.G. Todman, S.A. Amin, A. Oikonomou, H. Al-Omishy, N. Tucker, “Hand Motion Segmentation Against Skin Colour Background in Breast Awareness Applications,” Proc. 26th Annual International Conference of the IEEE EMBS, 2004. [9] J. Zeng, Y. Wang, M. Freedman, S.K. Mun, “Color-featurebased finger tracking for breast palpation quantification,” Proc. 1997 IEEE International Conference on Robotics and Automation, 1997. [10] G. Elkoura, K. Singh, “Handrix: animating the human hand,” Proc. 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation, 2003. [11] A.W. Fitzgibbon and R.B. Fisher, “A buyer’s guide to conic fitting,” Proc. 5th British Machine Vision Conf., Birmingham, 1995, pp. 513-522.

Suggest Documents